Search Results: "morph"

24 September 2015

Joachim Breitner: The Incredible Proof Machine

In a few weeks, I will have the opportunity to offer a weekend workshop to selected and motivated high school students¹ to a topic of my choice. My idea is to tell them something about logic, proofs, and the joy of searching and finding proofs, and the gratification of irrevocable truths. While proving things on paper is already quite nice, it is much more fun to use an interactive theorem prover, such as Isabelle, Coq or Agda: You get immediate feedback, you can experiment and play around if you are stuck, and you get lots of small successes. Someone² once called interactive theorem proving the worlds most geekiest videogame . Unfortunately, I don t think one can get high school students without any prior knowledge in logic, or programming, or fancy mathematical symbols, to do something meaningful with a system like Isabelle, so I need something that is (much) easier to use. I always had this idea in the back of my head that proving is not so much about writing text (as in normally written proofs) or programs (as in Agda) or labeled statements (as in Hilbert-style proofs), but rather something involving facts that I have proven so far floating around freely, and way to combine these facts to new facts, without the need to name them, or put them in a particular order or sequence. In a way, I m looking for labVIEW wrestled through the Curry-Horward-isomorphism. Something like this:

A proof of implication currying

So I set out, rounded up a few contributors (Thanks!), implemented this, and now I proudly present: The Incredible Proof Machine ³ This interactive theorem prover allows you to do perform proofs purely by dragging blocks (representing proof steps) onto the paper and connecting them properly. There is no need to learn syntax, and hence no frustration about getting that wrong. Furthermore, it comes with a number of example tasks to experiment with, so you can simply see it as a challenging computer came and work through them one by one, learning something about the logical connectives and how they work as you go. For the actual workshop, my plan is to let the students first try to solve the tasks of one session on their own, let them draw their own conclusions and come up with an idea of what they just did, and then deliver an explanation of the logical meaning of what they did. The implementation is heavily influenced by Isabelle: The software does not know anything about, say, conjunction ( ) and implication ( ). To the core, everything is but an untyped lambda expression, and when two blocks are connected, it does unification⁴ of the proposition present on either side. This general framework is then instantiated by specifying the basic rules (or axioms) in a descriptive manner. It is quite feasible to implement other logics or formal systems on top of this as well. Another influence of Isabelle is the non-linear editing: You neither have to create the proof in a particular order nor have to manually manage a proof focus . Instead, you can edit any bit of the proof at any time, and the system checks all of it continuously. As always, I am keen on feedback. Also, if you want to use this for your own teaching or experimenting needs, let me know. We have a mailing list for the project, the code is on GitHub, where you can also file bug reports and feature requests. Contributions are welcome! All aspects of the logic are implemented in Haskell and compiled to JavaScript using GHCJS, the UI is plain hand-written and messy JavaScript code, using JointJS to handle the graph interaction. Obviously, there is still plenty that can be done to improve the machine. In particular, the ability to create your own proof blocks, such as proof by contradiction, prove them to be valid and then use them in further proofs, is currently being worked on. And while the page will store your current progress, including all proofs you create, in your browser, it needs better ways to save, load and share tasks, blocks and proofs. Also, we d like to add some gamification, i.e. achievements ( First proof by contradiction , 50 theorems proven ), statistics, maybe a share theorem on twitter button. As the UI becomes more complicated, I d like to investigating moving more of it into Haskell world and use Functional Reactive Programming, i.e. Ryan Trickle s reflex, to stay sane. Customers who liked The Incredible Proof Machine might also like these artifacts, that I found while looking whether something like this exists:

Easyprove, an interactive tool to create textual proofs by clicking on rules.
Domino On Acid represents natural deduction rules in propositional logic with and as a game of dominoes.
Proofscape visualizes the dependencies between proofs as graphs, i.e. it operates on a higher level than The Incredible Proof Machine.
Proofmood is a nice interactive interface to conduct proofs in Fitch-style.
Proof-Game represents proofs trees in a sequent calculus with boxes with different shapes that have to match.
JAPE is an editor for proofs in a number of traditional proof styles. (Thanks to Alfio Martini for the pointer.)
Logitext, written by Edward Z. Yang, is an online tool to create proof trees in sequent style, with a slick interface, and is even backed by Coq! (Thanks to Lev Lamberov for the pointer.)
Carnap is similar in implementation to The Incredible Proof Machine (logical core in Haskell, generic unification-based solver). It currently lets you edit proof trees, but there are plans to create something more visual.
Clickable Proofs is a (non-free) iOS app that incorporates quite a few of the ideas that are behind The Incredible Proof Machine. It came out of a Bachelor s thesis of Tim Selier and covers propositional logic.
Euclid the game by Kasper Peulen is a nice game to play with geometric constructions.

Students with migration background supported by the START scholarship
Does anyone know the reference?
We almost named it Proofcraft , which would be a name our current Minecraft-wild youth would appreciate, but it is alreay taken by Gerwin Kleins blog. Also, the irony of a theorem prover being in-credible is worth something.
Luckily, two decades ago, Tobias Nipkow published a nice implementation of higher order pattern unification as ML code, which I transliterated to Haskell for this project.

31 March 2015

Zlatan Todori : Interviews with FLOSS developers: Francesca Ciceri

Debian and FLOSS community don't only occupy coding developers. They occupy people who write news, who talk about FLOSS, who help on booths and conferences, who create artistic forms of the community and so many others that contribute in countless ways. A lady, that is doing many of that is Francesca Ciceri, known in Debian as MadameZou. She is non-packaging Debian Developer, a fearless warrior for diversity and a zombie fan. Although it sounds intimidating, she is deep caring and great human being. So, what has MadaZou to tell us? Picture of MadameZou

Who are you? My name is Francesca and I'm totally flattered by your intro. The fearless warrior part may be a bit exaggerated, though. What have you done and what are you currently working on in FLOSS world? I've been a Debian contributor since late 2009. My journey in Debian has touched several non-coding areas: from translation to publicity, from videoteam to www. I've been one of the www.debian.org webmasters for a while, a press officer for the Project as well as an editor for DPN. I've dabbled a bit in font packaging, and nowadays I'm mostly working as a Front Desk member. Setup of your main machine? Wow, that's an intimate question! Lenovo Thinkpad, Debian testing. Describe your current most memorable situation as FLOSS member? Oh, there are a few. One awesome, tiring and very satisfying moment was during the release of Squeeze: I was member of the publicity and the www teams at the time, and we had to pull a 10 hours of team work to put everything in place. It was terrible and exciting at the same time. I shudder to think at the amount of work required from ftpmaster and release team during the release. Another awesome moment was my first Debconf: I was so overwhelmed by the sense of belonging in finally meeting all these people I've been worked remotely for so long, and embarassed by my poor English skills, and overall happy for just being there... If you are a Debian contributor I really encourage you to participate to Debian events, be they small and local or as big as DebConf: it really is like finally meeting family. Some memorable moments from Debian conferences? During DC11, the late nights with the "corridor cabal" in the hotel, chatting about everything. A group expedition to watch shooting stars in the middle of nowhere, during DC13. And a very memorable videoteam session: it was my first time directing and everything that could go wrong, went wrong (including the speaker deciding to take a walk outside the room, to demonstrate something, out of the cameras range). It was a disaster, but also fun: at the end of it, all the video crew was literally in stitches. But there are many awesome moments, almost too many to recall. Each conference is precious on that regard: for me the socializing part is extremely important, it's what cements relationships and help remote work go smoothly, and gives you motivation to volunteer in tasks that sometimes are not exactly fun. You are known as Front Desk member for DebConf's - what work does it occupy and why do you enjoy doing it? I'm not really a member of the team: just one of Nattie's minions! You had been also part of DebConf Video team - care to share insights into video team work and benefits it provides to Debian Project? The video team work is extremely important: it makes possible for people not attending to follow the conference, providing both live streaming and recording of all talks. I may be biased, but I think that DebConf video coverage and the high quality of the final recordings are unrivaled among FLOSS conferences - especially since it's all volunteer work and most of us aren't professional in the field. During the conference we take shifts in filming the various talks - for each talk we need approximately 4 volunteers: two camera operators, a sound mixer and the director. After the recording, comes the boring part: reviewing, cutting and sometimes editing the videos. It's a long process and during the conference, you can sometimes spot the videoteam members doing it at night in the hacklab, exhausted after a full day of filming. And then, the videos are finally ready to be uploaded, for your viewing pleasure. During the last years this process has become faster thanks to the commitment of many volunteers, so that now you have to wait only few days, sometimes a week, after the end of the conference to be able to watch the videos. I personally love to contribute to the videoteam: you get to play with all that awesome gear and you actually make a difference for all the people who cannot attend in person. You are also non-packaging Debian Developer - how does that feel like? Feels awesome! The mere fact that the Debian Project decided - in 2009 via a GR - to recognize the many volunteers who contribute without doing packaging work is a great show of inclusiveness, in my opinion. In a big project like Debian just packaging software is not enough: the final result relies heavily on translators, sysadmins, webmasters, publicity people, event organizers and volunteers, graphic artists, etc. It's only fair that these contributions are deemed as valuable as the packaging, and to give an official status to those people. I was one of the firsts non-uploading DD, four years ago, and for a long time it was just really an handful of us. In the last year I've seen many others applying for the role and that makes me really happy: it means that finally the contributors have realized that they deserve to be an official part of Debian and to have "citizenship rights" in the project. You were the leading energy on Debian's diversity statement - what gave you the energy to drive into it? It seemed the logical conclusion of the extremely important work that Debian Women had done in the past. When I first joined Debian, in 2009, as a contributor, I was really surprised to find a friendly community and to not be discriminated on account of my gender or my lack of coding skills. I may have been just lucky, landing in particularly friendly teams, but my impression is that the project has been slowly but unequivocally changed by the work of Debian Women, who raised first the need for inclusiveness and the awareness about the gender problem in Debian. I don't remember exactly how I stumbled upon the fact that Debian didn't have a Diversity Statement, but at first I was very surprised by it. I asked zack (Stefano Zacchiroli), who was DPL at the time, and he encouraged me to start a public discussion about it, sending out a draft - and helped me all the way along the process. It took some back and forth in the debian-project mailing list, but the only thing needed was actually just someone to start the process and try to poke the discussion when it stalled - the main blocker was actually about the wording of the statement. I learned a great deal from that experience, and I think it changed completely my approach in things like online discussions and general communication within the project. At the end of the day, what I took from that is a deep respect for who participated and the realization that constructive criticism does require certainly a lot of work for all parts involved, but can happen. As for the statement in itself: these things are as good as you keep them alive with best practices, but I think that are better stated explicitly rather than being left unsaid. You are involved also with another Front Desk, the Debian's one which is involved with Debian's New Members process - what are tasks of that FD and how rewarding is the work on it? The Debian Front Desk is the team that runs the New Members process: we receive the applications, we assign the applicant a manager, and we verify the final report. In the last years the workflow has been simplified a lot by the re-design of the nm.debian.org website, but it's important to keep things running smoothly so that applicants don't have too lenghty processes or to wait too much before being assigned a manager. I've been doing it for a less more than a month, but it's really satisfying to usher people toward DDship! So this is how I feel everytime I send a report over to DAM for an applicant to be accepted as new Debian Developer: Crazy pic

How do you see future of Debian development? Difficult to say. What I can say is that I'm pretty sure that, whatever the technical direction we'll take, Debian will remain focused on excellence and freedom. What are your future plans in Debian, what would you like to work on? Definetely bug wrangling: it's one of the thing I do best and I've not had a chance to do that extensively for Debian yet. Why should developers and users join Debian community? What makes Debian a great and happy place? We are awesome, that's why. We are strongly committed to our Social Contract and to users freedom, we are steadily improving our communication style and trying to be as inclusive as possible. Most of the people I know in Debian are perfectionists and outright brilliant in what they do. Joining Debian means working hard on something you believe, identifying with a whole project, meeting lots of wonderful people and learning new things. It ca be at times frustrating and exhausting, but it's totally worth it. You have been involved in Mozilla as part of OPW - care to share insights into Mozilla, what have you done and compare it to Debian? That has been a very good experience: it meant have the chance to peek into another community, learn about their tools and workflow and contribute in different ways. I was an intern for the Firefox QA team and their work span from setting up specific test and automated checks on the three version of Firefox (Stable, Aurora, Nightly) to general bug triaging. My main job was bug wrangling and I loved the fact that I was a sort of intermediary between developers and users, someone who spoke both languages and could help them work together. As for the comparison, Mozilla is surely more diverse than Debian: both in contributors and users. I'm not only talking demographic, here, but also what tools and systems are used, what kind of skills people have, etc. That meant reach some compromises with myself over little things: like having to install a proprietary tool used for the team meetings (and getting crazy in order to make it work with Debian) or communicating more on IRC than on mailing lists. But those are pretty much the challenges you have to face whenever you go out of your comfort zone . You are also volunteer of the Organization for Transformative Works - what is it, what work do you do and care to share some interesting stuff? OTW is a non profit organization to preserve fan history and cultures, created by fans. Its work range from legal advocacy and lobbying for fair use and copyright related issues, developing and maintaining AO3 -- a huge fanwork archive based on open-source software --, to the production of a peer-reviewed academic journal about fanworks. I'm an avid fanfiction reader and writer, and joining the OTW volunteers seemed a good way to give back to the community - in true Debian fashion . As a volunteer, I work for the Translation Committee: we are more than a hundred people - divided in several language teams - translating the OTW website, the interface of AO3 archive, newsletter, announcements and news posts. We have a orga-wide diversity statement, training for recruits, an ever growing set of procedures to smooth our workflow, monthly meetings and movie nights. It's an awesome group to work with. I'm deeply invested in this kind of work: both for the awesomeness of OTW people and for the big role that fandom and fanworks have in my life. What I find amazing is that the same concept we - as in the FLOSS ecosystem - apply to software can be applied to cultural production: taking a piece of art you love and expand, remix, explore it. Just for the fun of it. Protect and encourage the right to play in this cultural sandbox is IMO essential for our society. Most of the participants in the fandom come from marginalised group or minorities whose point of view is usually not part of the mainstream narratives. This makes the act of writing, remixing and re-interpreting a story not only a creative exercise but a revolutionary one. As Elizabeth Minkel says: "My preferred explanation is the idea that the vast majority of what we watch is from the male perspective authored, directed, and filmed by men, and mostly straight white men at that. Fan fiction gives women and other marginalised groups the chance to subvert that perspective, to fracture a story and recast it in her own way." In other words, "fandom is about putting debate and conversation back into an artistic process". On a personal side - you do a lot of DIY, handmade works. What have you done, what joy does it bring to you and share with us a picture of it? I like to think that the hacker in me morphs in a maker whenever I can actually manipulate stuff. The urge to explore ways of doing things, of create and change is probably the same. I've been blessed with curiousity and craftiness and I love to learn new DIY techniques: I cannot describe it, really, but if I don't make something for a while I actually feel antsy. I need to create stuff. Nowadays, I'm mostly designing and sewing clothes - preferably reproductions of dresses from the 40s and the 50s - and I'm trying to make a living of that. It's a nice challenge: there's a lot of research involved, as I always try to be historically accurate in design, sewing tecniques and material, and many hours of careful attention to details. I'm right in the process of make photoshoots for most of my period stuff, so I'll share with you something different: a t-shirt refashion done with the DebConf11 t-shirt! (here's the tutorial) T-shirt pic

12 March 2015

Erich Schubert: The sad state of sysadmin in the age of containers

System administration is in a sad state. It in a mess.

I'm not complaining about old-school sysadmins. They know how to keep systems running, manage update and upgrade paths.

This rant is about containers, prebuilt VMs, and the incredible mess they cause because their concept lacks notions of "trust" and "upgrades".

Consider for example Hadoop. Nobody seems to know how to build Hadoop from scratch. It's an incredible mess of dependencies, version requirements and build tools.

None of these "fancy" tools still builds by a traditional make command. Every tool has to come up with their own, incomptaible, and non-portable "method of the day" of building.

And since nobody is still able to compile things from scratch, everybody just downloads precompiled binaries from random websites. Often without any authentication or signature.

NSA and virus heaven. You don't need to exploit any security hole anymore. Just make an "app" or "VM" or "Docker" image, and have people load your malicious binary to their network.

The Hadoop Wiki Page of Debian is a typical example. Essentially, people have given up in 2010 to be able build Hadoop from source for Debian and offer nice packages.

To build Apache Bigtop, you apparently first have to install puppet3. Let it download magic data from the internet. Then it tries to run sudo puppet to enable the NSA backdoors (for example, it will download and install an outdated precompiled JDK, because it considers you too stupid to install Java.) And then hope the gradle build doesn't throw a 200 line useless backtrace.

I am not joking. It will try to execute commands such as e.g.

/bin/bash -c "wget http://www.scala-lang.org/files/archive/scala-2.10.3.deb ; dpkg -x ./scala-2.10.3.deb /"

Note that it doesn't even install the package properly, but extracts it to your root directory. The download does not check any signature, not even SSL certificates. (Source: Bigtop puppet manifests)

Even if your build would work, it will involve Maven downloading unsigned binary code from the internet, and use that for building.

Instead of writing clean, modular architecture, everything these days morphs into a huge mess of interlocked dependencies. Last I checked, the Hadoop classpath was already over 100 jars. I bet it is now 150, without even using any of the HBaseGiraphFlumeCrunchPigHiveMahoutSolrSparkElasticsearch (or any other of the Apache chaos) mess yet.

Stack is the new term for "I have no idea what I'm actually using".

Maven, ivy and sbt are the go-to tools for having your system download unsigned binary data from the internet and run it on your computer.

And with containers, this mess gets even worse.

Ever tried to security update a container?

Essentially, the Docker approach boils down to downloading an unsigned binary, running it, and hoping it doesn't contain any backdoor into your companies network.

Feels like downloading Windows shareware in the 90s to me.

When will the first docker image appear which contains the Ask toolbar? The first internet worm spreading via flawed docker images?

Back then, years ago, Linux distributions were trying to provide you with a safe operating system. With signed packages, built from a web of trust. Some even work on reproducible builds.

But then, everything got Windows-ized. "Apps" were the rage, which you download and run, without being concerned about security, or the ability to upgrade the application to the next version. Because "you only live once".

Update: it was pointed out that this started way before Docker: Docker is the new 'curl sudo bash' . That's right, but it's now pretty much mainstream to download and run untrusted software in your "datacenter". That is bad, really bad. Before, admins would try hard to prevent security holes, now they call themselves "devops" and happily introduce them to the network themselves!

11 February 2015

John Goerzen: Reactions to Has modern Linux lost its way? and the value of simplicity

Apparently I touched a nerve with my recent post about the growing complexity of issues. There were quite a few good comments, which I ll mention here. It s provided me some clarity on the problem, in fact. I ll try to distill a few more thoughts here. The value of simplicity and predictability The best software, whether it s operating systems or anything else, is predictable. You read the documentation, or explore the interface, and you can make a logical prediction that when I do action X, the result will be Y. grep and cat are perfect examples of this. The more complex the rules in the software, the more hard it is for us to predict. It leads to bugs, and it leads to inadvertant security holes. Worse, it leads to people being unable to fix things themselves one of the key freedoms that Free Software is supposed to provide. The more complex software is, the fewer people will be able to fix it by themselves. Now, I want to clarify: I hear a lot of talk about ease of use. Gnome removes options in my print dialog box to make it easier to use. (This is why I do not use Gnome. It actually makes it harder to use, because now I have to go find some obscure way to just make the darn thing print.) A lot of people conflate ease of use with ease of learning, but in reality, I am talking about neither. I am talking about ease of analysis. The Linux command line may not have pointy-clicky icons, but at least at one time once you understood ls -l and how groups, users, and permission bits interacted, you could fairly easily conclude who had access to what on a system. Now we have a situation where the answer to this is quite unclear in terms of desktop environments (apparently some distros ship network-manager so that all users on the system share the wifi passwords they enter. A surprise, eh?) I don t mind reading a manpage to learn about something, so long as the manpage was written to inform. With this situation of dbus/cgmanager/polkit/etc, here s what it feels like. This, to me, is the crux of the problem: It feels like we are in a twisty maze, every passage looks alike, and our flashlight ran out of battieries in 2013. The manpages, to the extent they exist for things like cgmanager and polkit, describe the texture of the walls in our cavern, but don t give us a map to the cave. Therefore, we are each left to piece it together little bits at a time, but there are traps that keep moving around, so it s slow going. And it s a really big cave. Other user perceptions There are a lot of comments on the blog about this. It is clear that the problem is not specific to Debian. For instance:

Christopher writes that on Fedora, annoying, niggling problems that used to be straightforward to isolate, diagnose and resolve by virtue of the platform s simple, logical architecture have morphed into a morass that s worse than the Windows Registry. Alessandro Perucchi adds that he s been using Linux for decades, and now his wifi doesn t work, suspend doesn t work, etc. in Fedora and he is surprisingly unable to fix it himself.
Nate bargman writes, in a really insightful comment, I do feel like as though I m becoming less a master of and more of a slave to the computing software I use. This is not a good thing.
Singh makes the valid point that this stuff is in such a state of flux that even if a person is one of the few dozen in the world that understand what goes into a session today, the knowledge will be outdated in 6 months. (Hal, anyone?)

This stuff is really important, folks. People being able to maintain their own software, work with it themselves, etc. is one of the core reasons that Free Software exists in the first place. It is a fundamental value of our community. For decades, we have been struggling for survival, for relevance. When I started using Linux, it was both a question and an accomplishment to have a useable web browser on many platforms. (Netscape Navigator was closed source back then.) Now we have succeeded. We have GPL-licensed and BSD-licensed software running on everything from our smartphones to cars. But we are snatching defeat from the jaws of victory, because just as we are managing to remove the legal roadblocks that kept people from true mastery of their software, we are erecting technological ones that make the step into the Free Software world so much more difficult than it needs to be. We no longer have to craft Modelines for X, or compile a kernel with just the right drivers. This is progress. Our hardware is mostly auto-detected and our USB serial dongles work properly more often on Linux than on Windows. This is progress. Even our printers and scanners work pretty darn well. This is progress, too. But in the place of all these things, now we have userspace mucking it up. We have people with mysterious errors that can t be easily assisted by the elders in the community, because the elders are just as mystified. We have bugs crop up that would once have been shallow, but are now non-obvious. We are going to leave a sour taste in people s mouth, and stir repulsion instead of interest among those just checking it out. The ways out It s a nasty predicament, isn t it? What are your ways out of that cave without being eaten by a grue? Obviously the best bet is to get rid of the traps and the grues. Somehow the people that are working on this need to understand that elegance is a feature a darn important feature. Sadly I think this ship may have already sailed. Software diagnosis tools like Enrico Zini s seat-inspect idea can also help. If we have something like an ls for polkit that can reduce all the complexity to something more manageable, that s great. The next best thing is a good map good manpages, detailed logs, good error messages. If software would be more verbose about the permission errors, people could get a good clue about where to look. If manpages for software didn t just explain the cavern wall texture, but explain how this room relates to all the other nearby rooms, it would be tremendously helpful. At present, I am unsure if our problem is one of very poor documentation, or is so bad that good documentation like this is impossible because the underlying design is so complex it defies being documented in something smaller than a book (in which case, our ship has not just sailed but is taking on water). Counter-argument: progress One theme that came up often in the comments is that this is necessary for progress. To a certain extent, I buy that. I get why udev is important. I get why we want the DE software to interact well. But here s my thing: this already worked well in wheezy. Gnome, XFCE, and KDE software all could mount/unmount my drives. I am truly still unsure what problem all this solved. Yes, cloud companies have demanding requirements about security. I work for one. Making security more difficult to audit doesn t do me any favors, I can assure you. The systemd angle To my surprise, systemd came up quite often in the discussion, despite the fact that I mentioned I wasn t running systemd-sysv. It seems like the new desktop environemt ecosystem is the systemd ecosystem in a lot of people s minds. I m not certain this is justified; systemd was not my first choice, but as I said in an earlier blog post, jessie will still boot . A final note I still run Debian on all my personal boxes and I m not going to change. It does awesome things. For under $100, I built a music-playing system, with Raspberry Pis, fully synced throughout my house, using a little scripting and software. The same thing from Sonos would have cost thousands. I am passionate about this community and its values. Even when jessie releases with polkit and all the rest, I m still going to use it, because it is still a good distro from good people.

22 December 2014

Erich Schubert: Java sum-of-array comparisons

This is a follow-up to the post by Daniel Lemire on a close topic.

Daniel Lemire hat experimented with boxing a primitive array in an interface, and has been trying to measure the cost.

I must admit I was a bit sceptical about his results, because I have seen Java successfully inlining code in various situations.

For an experimental library I occasionally work on, I had been spending quite a bit of time on benchmarking. Previously, I had used Google Caliper for it (I even wrote an evaluation tool for it to produce better statistics). However, Caliper hasn't seen much updates recently, and there is a very attractive similar tool at openJDK now, too: Java Microbenchmarking Harness (actually it can be used for benchmarking at other scale, too).

Now that I have experience in both, I must say I consider JMH superior, and I have switched over my microbenchmarks to it. One of the nice things is that it doesn't make this distinction of micro vs. macrobenchmarks, and the runtime of your benchmarks is easier to control.

I largely recreated his task using JMH. The benchmark task is easy: compute the sum of an array; the question is how much the cost is when allowing different data structures than double[].

My results, however, are quite different. And the statistics of JMH indicate the differences may be not significant, and thus indicating that Java manages to inline the code properly.

adapterFor       1000000  thrpt  50  836,898   13,223  ops/s
adapterForL      1000000  thrpt  50  842,464   11,008  ops/s
adapterForR      1000000  thrpt  50  810,343    9,961  ops/s
adapterWhile     1000000  thrpt  50  839,369   11,705  ops/s
adapterWhileL    1000000  thrpt  50  842,531    9,276  ops/s
boxedFor         1000000  thrpt  50  848,081    7,562  ops/s
boxedForL        1000000  thrpt  50  840,156   12,985  ops/s
boxedForR        1000000  thrpt  50  817,666    9,706  ops/s
boxedWhile       1000000  thrpt  50  845,379   12,761  ops/s
boxedWhileL      1000000  thrpt  50  851,212    7,645  ops/s
forSum           1000000  thrpt  50  845,140   12,500  ops/s
forSumL          1000000  thrpt  50  847,134    9,479  ops/s
forSumL2         1000000  thrpt  50  846,306   13,654  ops/s
forSumR          1000000  thrpt  50  831,139   13,519  ops/s
foreachSum       1000000  thrpt  50  843,023   13,397  ops/s
whileSum         1000000  thrpt  50  848,666   10,723  ops/s
whileSumL        1000000  thrpt  50  847,756   11,191  ops/s

The postfix is the iteration type: sum using for loops, with local variable for the length (L), or in reverse order (R); while loops (again with local variable for the length). The prefix is the data layout: the primitive array, the array using a static adapter (which is the approach I have been using in many implementations in cervidae) and using a "boxed" wrapper class around the array (roughly the approach that Daniel Lemire has been investigating. On the primitive array, I also included the foreach loop approach (for(double v:array) ).

If you look at the standard deviations, the results are pretty much identical, except for reverse loops. This is not surprising, given the strong inlining capabilities of Java - all of these codes will lead to next to the same CPU code after warmup and hotspot optimization.

I do not have a full explanation of the differences the others have been seeing. There is no "polymorphism" occurring here (at runtime) - there is only a single Array implementation in use; but this was the same with his benchmark.

Here is a visualization of the results (sorted by average):
Result boxplots

As you can see, most results are indiscernible. The measurement standard deviation is higher than the individual differences. If you run the same benchmark again, you will likely get a different ranking.

Note that performance may - drastically - drop once you use multiple adapters or boxing classes in the same hot codepath. Java Hotspot keeps statistics on the classes it sees, and as long as it only sees 1-2 different types, it performs quite aggressive optimizations instead of doing "virtual" method calls.

30 November 2014

Enrico Zini: cxx11-talk-notes

C++11 talk notes On 2014-11-27 I gave a talk about C++ and new features introduced with C++11: these are the talk notes. See cxx11-talk-examples for the examples. (note: I had to add U+2063 INVISIBLE SEPARATOR to prevent noreturn statements to be misinterpreted by the blog formatter. If you copypaste the code and encounter issues, you may want to delete the noreturn statements and retype them) Overview of programming languages

It has to be as fast as possible, so interpreted languages are out. You don't want to micro manage memory, so C is out. You don't want to require programmers to have a degree, so C++ is out. You want fast startup and not depend on a big runtime, so Java is out. [...] (Bram Moolenaar)

C++ secret cultist protip

Do not call up what you cannot put down.

C++ is a compiled language It is now possible to use the keyword constexpr to mark functions and objects that can be used at compile time:

/*
 * constexpr tells the compiler that a variable or function can be evaluated at
 * compile time.
 *
 * constexpr functions can also be run at run time, if they are called with
 * values not known at compile time.
 *
 * See http://en.cppreference.com/w/cpp/language/constexpr for more nice examples
 *
 * It can be used to avoid using constants in code, and using instead functions
 * for computing hardware bitfields or physical values, without losing in
 * efficiency.
 */
#include <iostream>
using namespace std;
constexpr int factorial(int n)
 
    return n <= 1 ? 1 : (n * factorial(n-1));
 
int main()
 
    cout << "Compile time factorial of 6: " << factorial(6) << endl;
    cout << "Enter a number: ";
    int a;
    cin >> a;
    cout << "Run time factorial of " << a << ": " << factorial(a) << endl;

See also this for more nice examples. See this and this for further discussion. Multiline strings

        const char* code = R"--(
          printf("foo\tbar\n");
          return 0;
        )--";

See this. C++ memory management protip RAII: Resource Acquisition Is Instantiation This is not new in C++11, but in my experience I have rarely seen it mentioned in C++ learning material, and it does make a huge difference in my code. See this and this for details. Constructors and member initializer lists Initializers in curly braces now have their own type: std::initializer_list:

#include <string>
#include <iostream>
#include <unordered_set>
using namespace std;
// std::initializer_list< >
//   will have as its value all the elements inside the curly braces
string join(initializer_list<string> strings)
 
    string res;
    for (auto str: strings)
     
        if (!res.empty())
            res += ", ";
        res += str;
     
    return res;
 
int main()
 
    unordered_set<string> blacklist  ".", "..", ".git", ".gitignore"  ;
    cout << join(  "antani", "blinda"  ) << endl;

See this for details, including the new uniform initialization trick of omitting parentesis in constructors so that you can call normal constructors and initializer_list constructors with the same syntax, which looks like an interesting thing when writing generic code in templates. Type inference I can now use auto instead of a type to let the compiler automatically compute the value of something I assign to:

        auto i = 3 + 2;
        // See also https://github.com/esseks/monicelli
        vector<string> names  "antani", "blinda", "supercazzola"  ;
        for (auto i = names.cbegin(); i != names.cend(); ++i)
            cout << i;
        template<typename T>
        T frobble(const T& stuff)
         
             // This will work whatever type is returned by stuff.read()
             auto i = stuff.read();
             //

See this for more details. Range-based for loop C++ now has an equivalent of the various foreach constructs found in several interpreted languages!

        for (auto i: list_of_stuff)
                cout << i << endl;
        for (auto n:  0,1,2,3,4,5 )
                cout << n << endl;
        // This construct:
        for (auto i: stuff)
        // If stuff is an array, it becomes:
        for (i = stuff, i < stuff + sizeof(stuff) / sizeof(stuff[0]); ++i)
        // If stuff has .begin() and .end() methods it becomes:
        for (i = stuff.begin(); i != stuff.end(); ++i)
        // Otherwise it becomes:
        for (i = begin(stuff); i != end(stuff); ++i)
        // And you can define begin() and end() functions for any type you
        // want, at any time

See this and this for details. Lambda functions and expressions Lambdas! Closures! Something like this:

// JavaScript
var add = function(a, b)   return a + b;

# Python
add = lambda a, b: a + b

Becomes this:

auto add = [](int a, int b)   return a + b;

And something like this:

// JavaScript
var a = 0;
$.each([1, 2, 3, 4], function(idx, el)   a += el  );

Becomes this:

unsigned a = 0;
std::for_each(  1, 2, 3, 4  , [&a](int el)   return a += el;  );

See this, this and this. Tuple types C++ now has a std::tuple type, that like in Python can be used to implement functions that return multiple values:

        tuple<int, string, vector<string>> parse_stuff()
         
                return make_tuple(id, name, values);
         
        string name; vector<string> values;
        // std::ignore can be used to throw away a result
        tie(ignore, name, values) = parse_stuff();
        // std::tie can also be used to do other kind of
        // multi-operations besides assignment:
        return tie(a, b, c) < tie(a1, b1, c1);
        // Is the same as:
        if (a != a1) return a < a1;
        if (b != b1) return b < b1;
        return c < c1;

See here, here and here. Regular expressions We now have regular expressions!

        std::regex re(R"((\w+)\s+(\w+))");
        string s("antani blinda");
        smatch res;
        if (regex_match(s, res, re))
            cout << "OK " << res[1] << " -- " << res[2] << endl;

The syntax is ECMAScript by default and can be optionally changed to basic, extended, awk, grep, or egrep. See here and here. General-purpose smart pointers There is std::unique_ptr to code memory ownership explicitly, and std::shared_ptr as a reference counted pointer, and smart pointers can have custom destructors:

    unique_ptr<dirent, std::function<void(void*)>> dirbuf((dirent*)malloc(len), free);

See here and here. Miscellaneous other cool things Standard attribute specifiers

string errno_str(int error)
 
    char buf[256];
#if (_POSIX_C_SOURCE >= 200112L   _XOPEN_SOURCE >= 600) && ! _GNU_SOURCE
    strerror_r(errno, buf, 256);
    string res(buf);
#else
    string res(strerror_r(errno, buf, 256));
#endif
    return res;
 
[ [noreturn]] void throw_libc_error(int error)
 
    throw runtime_error(errno_str(error));

See here. Hash tables See here and look at the new containers unordered_set, unordered_map, unordered_multiset, and unordered_multimap. Multithreading There is a standard threading model, with quite a bit of library support: see here, here, here, and here for atomic data structures. Variadic templates Templates can now take variable number of arguments, and that opens possibilities for interesting code generation, like implementing a generic, type-safe printf statement, or something like this:

db.query(R"(
   INSERT INTO table NAMES (id, name, description)
     VALUES (?, ?, ?)
)", 4, "genio", "fantasia, intuizione, decisione, e velocit  di esecuzione");

See here and here. Essential tools You need at least g++ 4.8 or clang 3.3 to have full C++11 support. They will be both available in jessie, and for wheezy you can use the nightly clang packages repository. I cannot think of a good excuse not to use -Wall on new code. scan-build from clang is another nice resource for catching even more potential problems at compile time. valgrind is a great tool for runtime code analysis: valgrind --tool=memcheck (the default) will check your program for wrong memory accesses and memory leaks. valgrind --tool=callgrind will trace function calls for profiling, to be analyzed with kcachegrind.

valgrind
--tool=helgrind

can check multi-threaded programs for suspicious concurrent memory accesse patterns. And of course gdb: a nice trick with C++ is to issue catch throw to get a breakpoint at the point where an exception is being thrown. help catch provides a list of other interesting catch examples. Coredump tips: ulimit -c to enable core dumps, triggering a core dump with ^\, opening a core with gdb program core, and more details on man 5 core. An extra gdb tip, which is not related to C++ but helped me considerably recently, is that it can be attached to running python programs to get a live Python traceback.

25 September 2014

Aigars Mahinovs: Distributing third party applications via Docker?

Recently the discussions around how to distribute third party applications for "Linux" has become a new topic of the hour and for a good reason - Linux is becoming mainstream outside of free software world. While having each distribution have a perfectly packaged, version-controlled and natively compiled version of each application installable from a per-distribution repository in a simple and fully secured manner is a great solution for popular free software applications, this model is slightly less ideal for less popular apps and for non-free software applications. In these scenarios the developers of the software would want to do the packaging into some form, distribute that to end-users (either directly or trough some other channels, such as app stores) and have just one version that would work on any Linux distribution and keep working for a long while. For me the topic really hit at Debconf 14 where Linus voiced his frustrations with app distribution problems and also some of that was touched by Valve. Looking back we can see passionate discussions and interesting ideas on the subject from systemd developers (another) and Gnome developers (part2 and part3). After reading/watching all that I came away with the impression that I love many of the ideas expressed, but I am not as thrilled about the proposed solutions. The systemd managed zoo of btrfs volumes is something that I actually had a nightmare about. There are far simpler solutions with existing code that you can start working on right now. I would prefer basing Linux applications on Docker. Docker is a convenience layer on top of Linux cgroups and namespaces. Docker stores its images in a datastore that can be based on AUFS or btrfs or devicemapper or even plain files. It already has a semantic for defining images, creating them, running them, explicitly linking resources and controlling processes. Lets play a simple scenario on how third party applications should work on Linux. Third party application developer writes a new game for Linux. As his target he chooses one of the "application runtime" Docker images on Docker Hub. Let's say he chooses the latest Debian stable release. In that case he writes a simple Dockerfile that installs his build-dependencies and compiles his game in "debian-app-dev:wheezy" container. The output of that is a new folder containing all the compiled game resources and another Dockerfile - this one describes the runtime dependencies of the game. Now when a docker image is built from this compiled folder, it is based on "debian-app:wheezy" container that no longer has any development tools and is optimized for speed and size. After this build is complete the developer exports the Docker image into a file. This file can contain either the full system needed to run the new game or (after #8214 is implemented) just the filesystem layers with the actual game files and enough meta-data to reconstruct the full environment from public Docker repos. The developer can then distribute this file to the end user in the way that is comfortable for them. The end user would download the game file (either trough an app store app, app store website or in any other way) and import it into local Docker instance. For user convenience we would need to come with an file extension and create some GUIs to launch for double click, similar to GDebi. Here the user would be able to review what permissions the app needs to run (like GL access, PulseAudio, webcam, storage for save files, ...). Enough metainfo and cooperation would have to exist to allow desktop menu to detect installed "apps" in Docker and show shortcuts to launch them. When the user does so, a new Docker container is launched running the command provided by the developer inside the container. Other metadata would determine other docker run options, such as whether to link over a socket for talking to PulseAudio or whether to mount in a folder into the container to where the game would be able to save its save files. Or even if the application would be able to access X (or Wayland) at all. Behind the scenes the application is running from the contained and stable libraries, but talking to a limited and restricted set of system level services. Those would need to be kept backwards compatible once we start this process. On the sandboxing part, not only our third party application is running in a very limited environment, but also we can enhance our system services to recognize requests from such applications via cgroups. This can, for example, allow a window manager to mark all windows spawned by an application even if the are from a bunch of different processes. Also the window manager can now track all processes of a logical application from any of its windows. For updates the developer can simply create a new image and distribute the same size file as before, or, if the purchase is going via some kind of app-store application, the layers that actually changed can be rsynced over individually thus creating a much faster update experience. Images with the same base can share data, this would encourage creation of higher level base images, such as "debian-app-gamegl:wheezy" that all GL game developers could use thus getting a smaller installation package. After a while the question of updating abandonware will come up. Say there is is this cool game built on top of "debian-app-gamegl:wheezy", but now there was a security bug or some other issue that requires the base image to be updated, but that would not require a recompile or a change to the game itself. If this Docker proposal is realized, then either the end user or a redistributor can easily re-base the old Docker image of the game on a new base. Using this mechanism it would also be possible to handle incompatible changes to system services - ten years down the line AwesomeAudio replaces PulseAudio, so we create a new "debian-app-gamegl:wheezy.14" version that contains a replacement libpulse that actually talks to AwesomeAudio system service instead. There is no need to re-invent everything or push everything and now package management too into systemd or push non-distribution application management into distribution tools. Separating things into logical blocks does not hurt their interoperability, but it allows to recombine them in a different way for a different purpose or to replace some part to create a system with a radically different functionality. Or am I crazy and we should just go and sacrifice Docker, apt, dpkg, FHS and non-btrfs filesystems on the altar of systemd? P.S. You might get the impression that I dislike systemd. I love it! Like an init system. And I love the ideas and talent of the systemd developers. But I think that systemd should have nothing to do with application distribution or processes started by users. I am sometimes getting an uncomfortable feeling that systemd is morphing towards replacing the whole of System V jumping all the way to System D and rewriting, obsoleting or absorbing everything between the kernel and Gnome. In my opinion it would be far healthier for the community if all of these projects would developed and be usable separately from systemd, so that other solutions can compete on a level playing field. Or, maybe, we could just confess that what systemd is doing is creating a new Linux meta-distribution.

15 June 2014

Iustin Pop: Edge of Tomorrow, book vs. movie

Warning: some spoilers ahead. A few months ago I saw the trailer for Edge of Tomorrow (film). Normally I don't look for trailers, probably this was shared on G+ by someone; but the basic idea behind the trailer was interesting enough I tried to look for more information about the (then upcoming) movie. To my surprise, I learned that this Hollywood movie is based on a Japanese novel - All You Need Is Kill. Since (broadly speaking) Japanese style is quite different from Hollywood style, I became even more interested. So I bought the book, and went through it quite quickly - I liked it a lot, and for the most part is has kept me reading and reading. I would say the book is quite straightforward, with a bitter-sweet ending that is what I (dreaded and) expected from it. Fast forward a few months, and yesterday I saw the movie. I had somewhat lower expectations for it compared to the book, but I was surprised at how they managed to morph the Japanese setting in the book into an European one and give a good introduction into the plot. The downside is that they had to make it somewhat melodramatic here and there, and that they added quite a bit of extra plot to fill in the time; on the other hand, it skipped a lot of background detail that the book brought and which explains the setting of the war. The biggest change however was to the overall plot line: the book is only about a single battle, and makes it clear at the end that the war is far from over. The movie, in grand Hollywood style, solves the entire war in one neat swipe, and has a quite happy ending. Which is not bad per se, but doesn't have the same emotional impact as the book. Oh, and they made the aliens scarier but in a strange way prettier (or better said less alien). Interesting that they felt the need to do it: the aliens in the book were (definitely) scary by their behaviour/abilities, yet they had to make them "look scary" in a way that connects to our visceral feelings, rather to the logical fear that a non-earth-like life form would bring. Speaking about the cast: I would still have preferred the Japanese setting of the book compared to the more western one in the movie, but at least one of the two main characters in the movie had a well-chosen person playing it. Overall, I'd give the book a 4/5 rating, and the movie (still) a 3/5. I enjoyed both, and the main plot idea is, if not new in SF, still appealing.

4 January 2014

Simon Josefsson: Necrotizing Fasciitis

Dear World, On the morning of December 24th I felt an unusual pain in my left hand between the thumb and forefinger. The pain increased and in the afternoon I got a high fever, at some point above 40 degrees Celsius or 104 degree Fahrenheit. I went to the emergency department and was hospitalized during the night between the 24th and 25th of December. On the afternoon of December 26th I underwent surgery to find out what was happening, and was then diagnosed with Necrotizing Fasciitis (the wikipedia article on NF gives a fair summary), caused by the common streptococcus bacteria (again see wikipedia article on Streptococcus). A popular name for the disease is flesh-eating bacteria. Necrotizing Fasciitis is a rare and aggresive infection, often deadly if left untreated, that can move through the body at speeds of a couple of centimeters per hour. I have gone through 6 surgeries, leaving wounds all over my left hand and arm. I have felt afraid of what the disease will do to me, anxiety over what will happen in the future, confusion and uncertainty about how a disease like this can exist and that I get the right treatment since so little appears to be known about it. The feeling of loneliness and that nobody is helping, or even can help, has also been present. I have experienced pain. Even though pain is something I m less afraid of (I have a back problem) compared to other feelings, I needed help from several pain killers. I ve received normal Paracetamol, stronger NSAID s (e.g., Ketorolac/Toradol), several Opiate pain-killers including Alfentanil/Rapifen, Tramadol/Tradolan, OxyContin/OxyNorm, and Morphine. After the first and second surgery, nothing helped and I was still screaming with pain and kicking the bed. After the first surgery, I received a local anesthetic (a plexus block). After the second surgery, the doctors did not want to masquerade my pain, because sign of pain indicate further growth of the infection, and I was given the pain-dissociative drug Ketamine/Ketalar and the stress-releasing Clonidine/Catapresan. Once the third surgery removed all of the infection, pain went down, and I experienced many positive feelings. I am very grateful to be alive. I felt a strong sense of inner power when I started to fight back against the decease. I find joy in even the simplest of things, like being able to drink water or seeing trees outside the window. I cried out of happiness when I saw our children s room full of toys. I have learned many things about the human body, and I am curious by nature so I look forward to learn more. I hope to be able to draw strength from this incident, to help me prioritize better in my life. My loving wife sa has gone through a nightmare as a consequence of my diagnosis. At day she had to cope with daily life taking care of our wonderful 1-year old daughter Ingrid and 3-year old boy Alfred. All three of them had various degrees of strep throat with fever, caused by the same bacteria and anyone with young kids know how intense that alone can be. She gave me strength over the phone. She kept friends and relatives up to date about what happened, with the phone ringing all the time. She worked to get information out from the hospital about my status, sometimes being rudely treated and just being hanged up on. After a call with the doctor after the third surgery, when the infection had spread from the hand to within 5cm of my torso, she started to plan for a life without me. My last operation were on Thursday January 2nd and I left hospital the same day. I m writing this on the Saturday of January 4rd, although some details and external links have been added after that. I have regained access to my arm and hand and doing rehab to regain muscle control, while my body is healing. I m doing relaxation exercises to control pain and relax muscles, and took the last strong drug yesterday. Currently I take antibiotics (more precisely Clindamycin/Dalacin) and the common Paracetamol-based pain-killer Alvedon together with on-demand use of an also common NSAID containing Ibuprofen (Ipren). My wife and I were even out at a restaurant tonight. Fortunately I was healthy when this started, and with bi-weekly training sessions for the last 2 years I was physically at my strongest peak in my 38 year old life (weighting 78kg or 170lb, height 182cm or 6 feet). I started working out to improve back issues, increase strength, and prepare for getting older. Exercise has never been my thing although I think it is fun to run medium distances (up to 10km). I want thank everyone who helped me and our family through this, both professionally and personally, but I don t know where to start. You know who you are. You are the reason I m alive. Naturally, I want to focus on getting well and spend time with my family now. I don t yet know to what extent I will recover, but the prognosis is good. Don t expect anything from me in the communities and organization that I m active in (e.g., GNU, Debian, IETF, Yubico). I will come back as energy, time and priorities permits.

16 November 2013

Lars Wirzenius: Debian: developing it wrong

This is an essay form of a talk I have given today at the Cambridge Mini-debconf. The talk was videoed, so will presumably show up in the Debconf video archive eventually. Abstract Debian has a long and illustrious history. However, some of the things we do perhaps no longer make as much sense as they used to do. It's a new millennium, and we might find better ways of doing things. Things that used to be difficult might now be easy, if we dare look at things from a fresh perspective. I have been doing that at work, for the Baserock project, and this talk is a compilation of observations based on what I've learnt, concentrating on things that affect the development workflow of package maintainers. Introduction and background I have been a Debian developer since August, 1996 (modulo a couple of retirements), and have used it a little bit longer. I have done a variety of things for Debian, from maintaining PGP 2 packages to writing piuparts to blathering excessively on the mailing lists. My day job is to develop the Baserock system at Codethink. Baserock is a set of tools and workflows for developing embedded and appliance Linux systems. If you squint, it looks a bit like a source-based Linux distribution. I have worked on Baserock since September, 2011. Some of Baserock's design has been influenced by my experience with Debian. With Baserock I have the chance to fix all the things that are wrong in Debian, and this talk is me giving back to Debian by pointing out some the things I feel should be fixed. I don't have solutions for these problems: this is a bug report, not a patch. It's also perhaps a bit of a rant. I am specifically concentrating here on technical and tooling issues that affect the development process of Debian. I excluding social issues. I am also not trying to get Debian to switch to Baserock. Baserock is targeting embedded and appliance systems, and makes simplifying assumptions based on those targets, which Debian does not get to do. I am pointing out problems, and I am outlining solutions as implemented in Baserock, when I think the concept carries over well to Debian. Build tools should be intelligent, packaging should be dumb In Debian, the fundamental tool for compiling the upstream code and assembling a binary package is dpkg-buildpackage. It uses debian/rules, which is a Makefile with targets with specific names and semantics. By executing the right targets in the right order, dpkg-buildpackage tells the source package to build the binary package. On the one hand, this is a nice design, because it abstracts away the large variety of upstream build systems into one API for dpkg-buildpackage to use. On the other hand, it puts all the intelligence for how packages are built into the packages. Making packaging intelligent, rather than the build tool, means packagers need to do more work, and there's more work to do when, say, the Debian policy changes, or when there are other changes that affect a large number of packages. If packaging is intelligent, then every package needs changes. If the build tool is intelligent, then you change the build tool and re-build everything. In Baserock we put as much intelligence as we can into the Morph tool, which drives the build process. It turns out that today, unlike in 1995, most upstream projects use one of a handful well-known build systems: autotools, cpan, cmake, Python distutils, etc. With just a little extra logic in Morph we avoid having any delta against upstream in Baserock. This doesn't work for quite all upstream projects, of course, but we've spent the 2% (two, not twenty) of effort that solves 80% of the problem. In recent years, the dh approach to packaging has made a lot of packages be able to have only a minimal, 3-line debian/rules file. This is excellent. Wouldn't it be nice if even that wasn't needed? It'd save thousands of files in source packages across the archive, at least. It would be easy to: if the file is missing, have dpkg-buildpackage assume the 3-line version by default. Getting rid of a single file is, of course, not a particularly big win. The big win is the change in mindset: rather than dealing with all new issues in development by adding yet more fields to debian/control and more optional, competing tooling outside the core toolset, if you improve the tools everyone uses, then everyone's packages gets better. The goal should, in my opinion, be that for the large number of packages where upstream uses a well-known, well-behaved build system, and uses it in a reasonably sensible way, the Debian source package should not require anything added to make the package build. There will still be a need to add some stuff, such as the debian/copyright file, to make a good Debian package, but just getting the package to build should require nothing extra. (Side note: wouldn't it be nice if there was a well-known, widely used way to mark up copyright information so that debian/copyright could be constructed and updated automatically?) Configuration file handling on upgrades: add ucf to dpkg already In the 1990s, dpkg had excellent handling of configuration files and merging local changes with changes from the new package version, but it was excellent only because it tried to do that at all, and mostly nothing else did. It hasn't changed much, since, and it's not excellent on any absolute scale. We have the ucf tool, which can do a better job, but requires to be added to each package wanting to use it. Why don't we make dpkg smarter, instead? If ucf is not not good enough to be merged wholesale into dpkg, let's write something better. Making every package maintainer use ucf manually is just wasteful. This is not the kind of thing that should be done for each package separately: the package manager should be smart so that packaging can be stupid. The goal should be that dpkg is smart enough in its configuration file handling that having the package do it is a very rare special case. Clean building shouldn't be hard The basic tool for building a package is dpkg-buildpackage. It is ever so slightly cumbersome to use, so there are some wrappers, most importantly debuild. However, if you're making a build intended to be uploaded to the Debian archive, you should be doing a clean build. This means having to learn, configure, and use yet more tools. A clean build is important: security updates, development, debugging, quality control, porting, and user support become more difficult if we don't know how a package was built, and can't reproduce the build. It gets harder to keep build dependencies correct, making it harder for everyone to build things. Luckily, Debian has solved the clean build problem. Unluckily, it has solved it multiple times, creating the problem of having to choose the clean building approach you want to use. The default way of building is not clean, so then you have to remember to use the non-standard way. You also get to spend time maintaining the clean build environments yourself, since that doesn't seem to be fully automated. None of this is hard, as such, but it's extra friction in the development workflow. The primary approaches for clean building in Debian seem to be pbuilder, cowbuilder, and sbuild. I happen to use pbuilder myself, because it's what's been around the longest, but I make no claim of having made an informed choice. That is part of the problem here: why should I have to spend the effort to become informed to make a choice well? Why is the default way of building not clean? Don't say performance: Morph sets up a clean staging area in seconds, and does not offer you a choice of not doing so. It's a chroot, with everything but the build tree hardlinked from cached, unpacked build dependencies, and protected using read-only bind mounts. What's more, this approach avoids having to maintain the pbuilder base tarballs or sbuild chroots manually. It's all automatic, and up to date, for every build. Furthermore, the staging area contains only the specified build dependencies, and not anything else, meaning a build fails if a build dependency is missing, rather than succeeding because it happens to be in the default set of packages the build tool installs. Mass building shouldn't be hard Suppose you want to try out a large-scale change in Debian. It might be trying out a new version of GCC, or using llvm's clang as the default C compiler, or updating glibc, or doing a large library transition such as a new version of GTK+, or trying a new dpkg version, or even something more exploratory such as changing the default optimisation flags for all packages to see what breaks. All of these will require you to at least rebuild all affected packages. Ideally you'd test the built packages as well, but let's concentrate on the building for now. Here's what you do. You make the changes you want to try out, and build those packages. You create a dedicated APT repository, and upload your packages there. You configure your build environment to favour that APT repository. You make a list of all reverse build dependencies of the packages you changed. You write a script to build all of those, preferably so that if you change A, and rebuild B, then you also rebuild C which build depends on B. Each rebuilt package you also upload to your APT repository. You keep track of the build log, and success or failure, of each build. The people in Debian who do this kind of stuff regularly presumably have tools for doing it. It shouldn't be a rare, special thing, though. If my package has reverse build dependencies, I should at least consider testing building when I'm making changes. Otherwise, it might take years until the reverse build dependencies are rebuilt, and the problem is only found then, making it harder to fix. To be fair, building a lot of packages takes a lot of resources. It's not feasible to rebuild everything in Debian every time there's any change to, say, eglibc. However, it's feasible to do it, for large subsets of the archive, without huge hardware investments. One VCS to rule them all In 1995 there was really only one relevant version control system: CVS. It was not a great tool. In 2000, another contender existed: Subversion. It fixed some problems in CVS, but still wasn't a great tool. In 2005, there was a great upheaval and distributed version control systems started to become mainstream. There were a large handful of them. In 2010, it was becoming pretty clear that git had won. It's ugly, but it's powerful. I'm not going to debate the relative merits of different version control systems. Until recently, I was a Bazaar boy, and all of my personal projects were kept in Bazaar. (I have recently switched everything to git.) There are, however, strong benefits from everyone using the same system. Developers don't need to learn a dozen version control systems. Tools that operate on many repositories are easier to write and maintain. Workflows become simpler if one system can be assumed. Debian has a strong historical tendency to choose every option. This is sometimes a good thing, and sometimes a bad thing. For keeping source packages in version control I believe it is to be a bad thing. The status quo is that a Debian source package may not be in version control at all, or it might be in any version control system. This is acceptable when everyone only ever needs to maintain their own packages. However, in a distribution the size of Debian, that is not the case. NMUs, security support, archive wide transitions, and other situations arise when I might need to change yours, or you might need to change mine. We try to work around this by having a complicated source package format, using quilt to maintain patches to upstream semi-manually in a debian/patches directory. This is an awkward workflow. It's a workflow that trips up those that are not used to it. (I know quilt is a patch management system, not a version control system. I think git does it much better anyway.) It would be oh so much easier if everyone kept their source packages in the same, real version control system. I don't even care what it is, as long as it is powerful enough to handle the use cases we have. Imagine a world where every Debian source package is kept, for argument's sake, git, and everyone also uses the same layout and roughly the same workflow to maintain it. What would this mean? It would mean that if you want to inspect the history of your package, you know how to do that. If you want to merge in some bugfix from upstream code, you know how to do that, without having to figure out which of the several source package formats are in use. It would make feasible the development of more powerful, more higher level tooling. For example, it would allow Debian to have what we call system branches in Baserock. In Debian we have stable, testing, unstable, and experimental. We may get something like Ubuntu's PPAs, or perhaps an improved version of those. These are very poor versions of system branches, just like quilt is a poor way to manage patches and deltas against upstream. For example, you can upload an experimental version of gccto experimental, but then nobody else can upload another experimental version. You can set up your own PPA for this, but you'll still be affected by all the uploads to unstable while you're working. A Baserock system branch is a branch of the entire system, or the entire distribution in the case of Debian. It is isolated from other system branches. In branch in an individual repository is a well-known concept. A system branch is conceptually like branching every repository in the distribution at once. The actual implementation is more efficient, of course. This would be possible to implement without standardising on one version control system, but it would be much harder to implement, and would have to live with the lowest common denominator for features. CVS and Subversion, for example, don't really to merges, where Bazaar, Mercurial, and git do. Possible does not mean feasible. Any work you do in a system branch is isolated. Your work doesn't affect others, and theirs doesn't affect yours, until a merge happens. This is a simple, but very powerful tool. Cheap system branches, and powerful merging, makes it possible to do experiments safely, with little fuss. Combine that with being able to build everything cleanly and quickly, and get into a situation where there's no need to have make technical decisions based on arguments on mailing lists, and instead they can be done by choosing by looking at working code. I don't know how this could be implemented in Debian, but think about it. If Debian could have this, it might many archive-scale changes easier. debian/rules clean: really? One of the silliest things we require of packages is that they have a debian/rules clean rule that cleans up after a build perfectly so that we can do repeated builds in the same source tree. Let's just use git clean -fdx instead. This is a problem that is superbly well suited for automation. There is no point whatsover making packagers do any manual work for this. Large scale semi-mechanical changes require too much effort About a decade ago, we decided to follow a new version of the Filesystem Hierarchy Standard and transition from /usr/doc to /usr/share/doc. This was an almost entirely mechanical change: in many cases, a mere rebuild would fix it, and in almost every other case it was just a minor tweak to the packaging. A one-line change. It took us seven years to do this. Seven years. Think about it. In a recent discussion about building R binary data files from source at package build time it was suggested that we take 2-3 release cycles to get this done. That's four to six years. Think about it. These are not isolated cases. Every time we need to make a change that affects more than a small handful of packages, it becomes a major undertaking. Most of the time all the people involved in this are agreeable to the change, and welcome it. The change takes long because it requires co-ordinating a large number of people. Sometimes people are just busy. Sometimes they've left the project, but haven't properly orphaned their packages. Waiting for a timeout for answers about packages drags the process longer. Mechanical changes, or semi-mechanical ones, which are very easy and very quick to do should not take years. They should take an evening, no more. There's no end of changes we might want to do like this. In 2005 and 2006 I filed a few hundred bugs from failed piuparts runs. They're still being fixed, even when the fix is simple, such as adding a postrm script to remove, when the package is purged, a configuration file created by postinst, or starting the service with invoke-rc.d rather than running an init.d script directly. Mostly one-line changes. There's more mechanical changes that might happen. For example, changing the value of the VCS-Browser field when the Debian version control server changes domain names. It's not just about simple, mechanical changes, either. Transitions of important library packages, for example, which require changes to reverse dependencies due to a changed API are another source of pain. What should be a few evenings of build fixes can drag out to months of co-ordination. This is caused partly by technical issues, and partly due to social issues. The main social issue is that we have quite a strong ownership of packages, and NMUs are to be done carefully, and only in some cases. This is both good and bad, and I won't discuss that aspect now. The technical issues are that our tools are primarily intended for maintaining individual packages, rather than all of them together, making it harder to make even simple changes in large number of packages. In addition to easy mass building and system branches, as outlined above, large-scale changes would require testing tools so that you don't build the new package versions, but also test things automatically. Essentially, CI at the distribution level. Conclusion I've listed above a small variety of problems I see in the Debian development processes and tools. They're not the important part of this talk. The important part is that it we, the Debian developers, should look at our tools, and workflows, critically, and improve them when we can. Even small improvements are very useful, when they affect each of our twenty thousand source packages. The important change I argue for here is one in mindset, rather than any specific problem in any tool. We need a mindset of constant, incremental improvement for our tools.

5 November 2013

Russ Allbery: Review: Asimov's, August 2011

Review: Asimov's Science Fiction, August 2011

Editor:	Sheila Williams
Issue:	Volume 35, No. 8
ISSN:	1065-2698
Pages:	112

The editorial this issue is about the Dell Magazines Award, so newsy without a lot of content (particularly since we don't get to read the winners). Silverberg's column is much more interesting, focusing on the complex rules of the honor code of the Albanian highlands and pointing out that there's more complexity and strangeness in things we can find on Earth than in a lot of science fiction. Even within the rules of real human cultures. James Patrick Kelly's column this issue is an interesting summary of what graduates of the Clarion writing workshop learned, in their own words. It's a nice barrage of quick writing tips, and an interesting view of what people take away from a writing workshop. The book review column this issue is Peter Heck's normal workman-like job. "The End of the Line" by Robert Silverberg: Someday, I really should read Lord Valentine's Castle. As you might guess from this comment, this is another Majipoor story. This one follows an official who is part of an advance party for the Coronal Lord, the ruler (of sorts). He's decided, as part of those duties, to take the opportunity to learn more about the aboriginal people of Majipoor: the somewhat mysterious metamorphs, or Piurivar. I'm not that familiar with the history of Majipoor, since all I've read is this story and one other, none of the novels. Apparently, knowing that the official in question is named Stiamot will place this story for more familiar readers. For the unfamiliar, such as myself, there's a lot of politics here and what's clearly background for a major event in the world, but as a standalone story it's a bit unsatisfying (and grim). It's not that clear why things had to turn out the way they did, and the characters seem largely without agency. It's well-written, but mostly a story for fans of the series, I think. (6) "Corn Teeth" by Melanie Tem: This is a tight third-person story about a human child raised by alien foster parents, and it's deeply disturbing. Not because of the fostering, which appears wonderful and loving, but because it's a train wreck sort of story: the reader can see the horrible coming and can't do anything about it (and it turns out even worse than one might expect). It's also a story built around failure of communication and failure of empathy, and has a monumentally depressing ending, the kind that leaves scars. I'm sure all of this is entirely intentional; it seems very well-written. But I really didn't want this much horrible misunderstanding and depressing hopelessness in my reading. (2) "Watch Bees" by Philip Brewer: I rather liked this story even if the protagonist is an awful person. The story is set in a future of bioengineered insects and hard economic times, and it features a man who is supposedly working his way from farm to farm to get back home. What he's actually after is more complicated and is closely related to the defense systems that keep intruders off of farms. You might guess some of the rest from the title. It's a story about understanding layered defense systems, and about economic warfare. I didn't care much for any of the characters, but the story is well-plotted and kept me interested in seeing what would happen next. (6) "For I Have Lain Me Down on the Stones of Loneliness and I'll Not Be Back Again" by Michael Swanwick: By Swanwick, so it's a little odd, but I found this story surprisingly moving and ambiguous. It's about an American of Irish descent, a trip to Ireland, and a love affair with a fierce Irish nationalist, all set against an SF background of an Earth conquered by benevolent aliens. It's angry, uncertain, fanatical, and realistic by turns, and left me with profound mixed feelings. I think it does a good job capturing in a brief story the emotional complexities of what it means to give onself to a cause. (7) "We Were Wonder Scouts" by Will Ludwigsen: This is a short and odd story about a variant of the Boy Scouts founded by a man who is a little too obsessed with the paranormal, and an outing in the woods that turns rather creepy. It's a type of story that I'm not very fond of: one that twists the delight of discovery into something dark and mundane. I suppose you could call it horror; it's more horror than fantasy, at least. Anyway, not my thing. (3) "Pairs" by Zachary Jernigan: This story, on the other hand, isn't as dark as it seems like it should be from the setup. Humans have been conquered and enslaved by more powerful alien races, and now human souls are a profitable business. The protagonist is a person who has been embodied in a spaceship, and who works with (and monitors) another largely insane embodied person as they work as couriers, carrying souls to their buyers. But neither of them are as fully under control as they might appear, which is the point of the story. There is no grand tale of redemption, and the price is high, but I found the psychological portrayal oddly satisfying and faintly hopeful, and I was intrigued by the world background. (7) "Paradise is a Walled Garden" by Lisa Goldstein: The cover story, this is by far the best story of the issue. It's steampunk, set in a world where Muslim civilization was not pressed back by Christianity and continues to thrive into the reign of Queen Elizabeth as, among other things, makers of automata. A girl has managed to get herself a job in a British factory by posing as a boy and is the first to sound the alarm when the automata that do most of the work go strangely (and violently) haywire. That leads to her being assigned to the subsequent delegation to Al-Andulus (Muslim Spain, if you're not familiar with that name from history). The protagonist is the best part of this story. She's thoughtful, resourceful, and delights in learning things, something that she's rarely had the opportunity to do. She's also utterly unintimidated. In Al-Andulus, she thrives, despite the contempt of the leader of the expedition and some dangerous intrigues around the source of the anomalous behavior. I won't spoil the ending, but it's a delight, leaving the reader with a lot of hope for her future. I also liked the portrayal of the Muslim world, which is engrossed in its own business and has its own advantages and disadvantages, but is at least open to and focused on learning and technological development. The contrast with the superstitious British delegation is both pointed and historically grounded in Muslim relations with Europe around the point of divergence of Goldstein's world. (8) Rating: 7 out of 10

2 November 2013

Russ Allbery: Review: Fantasy & Science Fiction, May/June 2011

Review: Fantasy & Science Fiction, May/June 2011

Editor:	Gordon van Gelder
Issue:	Volume 120, No. 5 & 6
ISSN:	1095-8258
Pages:	258

The editorial in this issue is about the investigation into the troubling death of long-time contributor F. Gwynplain MacIntyre (which was not his real name). It's disturbing, but to me it underscores one of the things that I love about the Internet: people for whom life isn't working very well can still find an outlet, make friendships, and control how they choose to present themselves to the world on-line. That's something quite valuable, and part of why the pushes for "real names" always gives me pause. Somewhat thematically related, this issue also features a non-fiction essay by Maria E. Alonzo about her investigation of Jesse Francis McComas, her great-uncle but better known to the SF community as one of the founding editors of F&SF and co-editor of the famous classic anthology Adventures in Time and Space. This is mostly a curiosity, but it's fun to read about the sense of triumph in tracking down lost family history. This issue also features a Chris Moriarty book review column, always a plus, as well as a few positive reviews of obscure superhero movies by Kathi Maio (plus the required grumbling about a more mainstream film). "The Final Verse" by Chet Williamson: This is more of a horror story than I would normally like, but I got pulled into the investigation of an old bluegrass song and the guesswork and footwork required to track down where it came from. Williamson does a good job with the tone and first-person narration, and the degree to which the protagonist cares about the song to the exclusion of the horrific happenings of the story blunts the horror. Not quite my thing, but I thought it was well-done and played well with the possible meanings of song lyrics. (6) "Stock Photos" by Robert Reed: This is well-written, like nearly all Reed stories, but it lacked enough clues for the reader for me. It's a very short story about a man who's out mowing his lawn when approached by two strangers who apparently want to take photographs of him for stock image collections. Then things get rather weird, but without any explanation, and the ending lost me completely. Frustrating. (It is partially explained by the later "The Road Ahead" story in this same issue.) (4) "The Black Mountain" by Albert E. Cowdrey: From one of F&SF's most reliable story-tellers to another, and this is a more typical story. Cowdrey offers an abandoned and very strange cathedral for an obscure religion, a conflict over a development project, and some rather creepy results, all told in Cowdrey's entertaining fashion. Some places you just don't mess with. (6) "Agent of Change" by Steven Popkes: Told Dos-Passos-style with news excerpts, web sites, and the transcript of an emergency committee, this story shows the discovery of Godzilla, or something akin to Godzilla, in the Pacific, where it's destroying whaling vessels. I do like this style of storytelling, and here it mixes well with humor and a bit of parody as Popkes shows how each different news outlet puts its own recognizable spin on the story. The story isn't particularly memorable, and it doesn't end so much as just stop, but it was fun. (7) "Fine Green Dust" by Don Webb: This story is dedicated to Neal Barrett, which will give SFF short story readers a warning of weirdness to come. In a near future where global warming as continued to make summers even more miserable, the protagonist happens across a naked woman painted green. The green turns out to be a sun block that claims to assist humans in metamorphosis into animals. Most of the story is the protagonist trying to decide what to think of that, interspersed with staring at his neighbor's naked daughter. It's mildly amusing if you don't think about it too much and don't mind the rather prominent male gaze. (5) "Rampion" by Alexandra Duncan: The novella of the story, this is set in Muslim Spain some time during the long fights between Muslims and Christians in the north. It's told as two parallel stories: one telling the protagonist's first meeting with his love, and the second following him as a blind man, some time later, deciding whether, and how, to re-engage with the world. The style feels like fantasy, but there's very little overt fantasy here, and the story could be read as historical adventure. It's good adventure, though; conventional in construction, but with some romance and some drama and a good ending. (7) "Signs of Life" by Carter Scholz: This is to science fiction what "Rampion" is to fantasy: not really SF in the classic sense, but fiction about the process of science. The protagonist works on gene sequencing and is mildly obsessed with a visualization of junk DNA in an attempt to find patterns in it. Like a lot of fiction about science, it's primarily concerned with office politics, grant funding, and an awful boss. There is a faint touch of the supernatural, but that strand of the story doesn't amount to much. There's a happy ending of sorts, but the story left me with a bad taste in my mouth, and I'd completely forgotten it by the time I sat down to write this review. (4) "Starship Dazzle" by Scott Bradfield: I've never seen much in Bradfield's ongoing series of stories about Dazzle, the talking dog. In this one, he's sent via rocket on a one-way trip into outer space and ends up making a bizarre sort of first contact. Like the other Dazzle stories, it's full of attempts at humor that don't really work for me, even though you'd think I'd be sympathetic to the mocking of our commercialization of everything. The ending is just silly, and not in a good way. (3) "The Old Terrologist's Tale" by S.L. Gilbow: I love the setup for this story. It's set in some sort of far future in which terraforming has become routine, and a group of people are telling each other stories over drinks. The first-person protagonist is a terrologist, someone who designs planets (and the technology is available to do this almost from scratch). The conversation is taking a turn towards the humiliating, with a politician belittling the work of terrologists, when an old terrologist who has been listening quietly starts telling a story about designing worlds, both mundane and dangerously beautiful. Gilbow does a great job here capturing blithe self-importance, the habit of belittling other people's technical work, and revenge via storytelling with a nasty barb. This was my favorite story of the issue. (7) "Altogether Elsewhere, Vast Herds of Reindeer" by Ken Liu: This is a rather odd but quite touching story about mothers, daughters, nature, connection, and uploading. It's set after a singularity, in a time when all humans are uploaded into computers and exploring higher dimensions, digital natives in a much deeper sense than is meant today. But Rene 's mother is an Ancient, from before the singularity and still three-dimensional, and she wants to spend some time with her daughter. That leads to a memorable moment of connection, without pulling Rene entirely out of her father's world. Well done. (7) "The Road Ahead" by Robert Reed: Two Reed stories in one issue! And this one is a sequel to "Stock Photos" from earlier, since apparently I wasn't the only one who found it hopelessly confusing. It provides some backstory and makes a bit more sense of the first story, and that also makes it a more interesting story in its own right. The stock photo concept wasn't entirely a lie, as I had thought it was after the first story. There is analysis, anticipation, and trends behind who the pair take pictures of. But this story explores some internal tension, some conflict between them and some knowledge that the woman has that the man doesn't. And in the process it makes everything creepier, but also more interesting, and provides a hint at a really dark way of viewing the news media. I would say that this salvages "Stock Photos," except that I don't think "Stock Photos" is necessary now that one can read this story. (7) "Music Makers" by Kate Wilhelm: This is another story about investigation of the history of music, mingled with the supernatural, but unlike the story that opened this issue, it's not horror. Rather, it's a gentle and sweet fantasy about the power of music and benevolent ghosts and a community coming together. It's a positive and happy note on which to end the issue. (6) Rating: 6 out of 10

3 September 2013

Joachim Breitner: Bachelor Thesis on Monads for Uncertainty

A work-related blog post for a change, but still about Haskell. My student Alexander Kuhnle has submitted his bachelor thesis Modeling Uncertain Data using Monads and an Application to the Sequence Alignment Problem today, in which he explores way to generalize algorithms from bio informatics (in particular suffix tree) to work on data with uncertainties of various kinds (indeterminism, probabilities etc.). He utilizes monads and variations thereof to make the code polymorphic in the particular kind of uncertainty. If this sounds interesting to you, have a look, and feel free to share your optinion with us.

14 March 2013

Iustin Pop: Types as control flow constructs

A bug I've recently seen in production code gave me the idea for this blog post. Probably smarter people already wrote better things on this topic, so this is mostly for myself, to better summarise my own thoughts. Corrections are welcome, please leave a comment! Let's say we have a somewhat standard API in Python or C++:

a function (init_t) to create/initialise our data type, which returns an object of type t
the object then has some properties or methods that we can call

Signalling failures to initialise the object can be done in two ways: either by raising an exception, or by returning a null/None result. There are advantages and disadvantages to both:

if using exceptions, then you must be sure to catch and handle all possible exceptions that the initialisation function can raise (otherwise more of your code can be aborted than you want)
if using None as return value, then you must check that the value is correct before using it, otherwise you get an exception or null pointer dereference

The None model can create latent bugs, for example in the following code:

ok = True
for arg in input_list:
  t = init_t(arg)
  if t is None:
    ok = False
    continue
  t.process()
return ok

The presence of the continue statement there is critical. Removing it, or moving it after some other statements which work with t will result in a bug. So using value-based returns forces us to introduce (manually) control points, without having the possibility to validate the model by the compiler (e.g. in C++). So it would seem that this kind of latent bugs pushes us to use the exception model, with its drawbacks. Let's look at how this interface would be implemented in (IMHO) idiomatic Haskell (where the a and b types represent the input and output types):

initT :: a -> Maybe T
processT :: T -> b
my_fn =
  
 case initT arg of
   Nothing -> -- handle failure
   Just v -> processT v

Yes, this can be written better, but it's beside the main point. The main point is that by introducing a wrapper type around our main type (T), we are forced via the type system to handle the failure case. We can't simply pass the result of initT to a function which accepts T, because it won't type check. And, no matter what we do with the result value, there are no exceptions involved here, so we only have to think about types/values, and not control flow changes. In effect, types become automatically-validated control-flow instructions. Or so it looks to me . So using types properly, we can avoid the exception-vs-return-value debate, and have all advantages without the disadvantages of either. If that is the case, why isn't this technique used more in other languages? At least in statically typed languages, it would be possible to implement it (I believe), via a C++ template, for example. In Python, you can't actually apply it, as there's no static way of enforcing the correct decapsulation. I was very saddened to see that Google's Go language, which is quite recent, has many examples where initialisation functions return a tuple err, value = , separating the actual value from the error, making it not safer than Python. It might be that polymorphic types are not as easy to work with, or it might be the lack of pattern matching. In any case, I don't think this was the last time I've seen a null pointer dereference (or the equivalent AttributeError: 'NoneType' object has no attribute ). Sadly You can even go further in Haskell and introduce more control flow structure via wrapper types. Please bear another contrived example: an HTML form that gets some input data from the user, validates it, saves it to the database, and echoes it back to the user. Without types, you would have to perform these steps manually, and ensure they are kept in the correct order when doing future modifications. With types, you only have to design the types correctly and export only smart contructors (but not the plain ones):

module Foo ( ValidatedValue
           , validateValue
           , RecordId
           , CommittedValue
           , commitValue
           , buildFeedbackForm
           ) where
data ValidatedValue a = ValidatedValue a
validateValue :: a -> Maybe (ValidatedValue a)
data RecordId =  
data CommittedValue a = CommittedValue a RecordId
commitValue :: ValidatedValue a -> ComittedValue a
buildFeedbackForm :: CommittedValue a -> HTMLDoc

From these types, it follows more or less that the only correct workflow is:

get a value from the user
validate it
commit it, getting a transaction/record ID
send the response to the user

In other words:

handleFeedbackForm = buildFeedbackForm . commitValue . validateValue

There are still issues here, e.g. the type a is completely hidden behind the wrapper types, and we can't recover some basic properties (even if we use newtype, unless we use GeneralizedNewtypeDeriving). But it does offer a way to improve control flow safety. And that is my 0.02 currency unit for today.

7 May 2012

Joachim Breitner: Free Groups in Agda

I must say that I do like free groups. At least whenever I play around with some theorem provers, I find myself formalizing free groups in them. For Isabelle, my development of free groups is already part of the Archive of Formal Proofs. Now I became interested in the theorem prover/programming language Agda,so I did it there as well. I was curious how well Agda is suited for doing math, and how comfortable with intuitionalistic logic I d be.
At first I wanted to follow the same path again and tried to define the free group on the set of fully reduced words. This is the natural way in Isabelle, where the existing setup for groups expects you to define the carrier as a subset of an existing type (the type here being lists of generators and their inverses). But I did not get far, and also I had to start using stuff like DecidableEquivalence, an indication that this might not go well with the intuitionalistic logic. So I changed my approach and defined the free group on all words as elements of the group, with a suitable equivalence relation. This allowed me define the free group construction and show its group properties without any smell of classical logic. The agda files can be found in my darcs repository, and the HTML export can be browsed: Generators.agda defines the sets-of-generators-and-inverses and FreeGroups.agda (parametrized by the Setoid it is defined over) the reduction relation and the group axioms. Here are some observations I (disclaimer: Agda-beginer) made:

Fun fact: Free groups exist not only in classical logic.
Without any automation as in Isabelle, even simple things get quite complicated. A simple substitution of an equality with subst requires me to specify not only the equality and the term I want it to apply, but also to repeat the common part of the terms. Or when using the associativity of list concatenation, I have to pass all three sublists to the lemma. Maybe I am a bit spoiled by Isabelle, but I d be worried that this would prevent large proofs.
The levels are also annoying. Although my theory stays within one level, I have to annotate it everywhere. I d expect the type inference to figure this out for me.
Equality reasoning with begin ... is quite nice and surprisingly well readable.
Besides the additional work, it is nice to be able to do the proof in almost all detail. There is a limitation, though, as some steps are done automatically (if they happen to occur when evaluating/normalizing a term) and the others, even if similar-looking, are not.
It d be great if one would be free in the choice of editor, but vim users generally have a hard time in the field of theorem provers.

If I were to extend this theory, there are two important facts to be shown: That there is a unique reduced word in every equivalence class (norm_form_uniq), and the universal property of the free group. For the former (started in NormalForm.agda) I m missing some general lemmas about relations (e.g. that local confluence implies global confluence, and even the reflexive, symmetric, transitive hull is missing in the standard library). For the latter, some general notions such as a group homomorphism need to be developed first. I planned to compare the two developments, Isabelle and Agda. But as they turned out to show quite things in different orders, this is not really possible any more. One motivation to look at Agda was to see if a dependently typed language frees me from doing lots of set-element-checking (see the mems lemma in the Isabelle proof of the Ping-Pong-Lemma). So far I had no such problems, but I did not get far enough yet to actually tell. Thanks to Helmut Grohne for an educating evening of Agda hacking!

Flattr this post

26 April 2012

Wouter Verhelst: Switching to duckduckgo

In the late 90s, google became popular for one reason: because they had a no-nonsense frontpage that loaded quickly and didn't try to play with your mind. Well, at least that was my motivation for switching. The fact that they were using a revolutionary new search algorithm which changed the way you search the web had nothing to do with it, but was a nice extra. Over the years, that small hand-written frontpage has morphed into something else. A behind-the-scenes look at the page shows that it's no longer the hand-written simple form of old, but something horrible that went through a minifier (read: obfuscator). Even so, a quick check against the Internet Wayback machine shows that the size of that page has increased twenty-fold, which is a lot. But I could live with that, since at least it looked superficially similar. Recently, however, they've changed their frontpage so that search-as-you-type is enabled by default. Switching that off requires you to log in. So, you have a choice between giving up your privacy by logging in before you enter a search term, or by having everything you type, including any typos and stuff you may not have confirmed yet, be sent over to a data center god knows where. Additionally, at the first character you type, the front page switches away to the results page, causing me to go "uh?!?" as I try to find where they moved my cursor to. This is annoying. Duckduckgo doesn't do these things; and since they also don't do things like combining my typing skills, phone contact list, calendar, and chat history to figure out that I might be interested in a date, I'm a lot more comfortable using them. So a few days ago, I decided to switch my default search engine in chromium to duckduckgo. It still feels a bit weird, to be using a browser written by one search engine to search something on another; but all in all, it's been a positive experience. And the fact that wikipedia results are shown first, followed by (maybe) one ad, followed by other search results, is refreshing. We'll see how far this gets us.

19 February 2012

Gregor Herrmann: RC bugs 2012/07

thanks to the Paris BSP & other activities we're seeing a nice decline in RC bugs. here are my recent contributions:

~~#554752~~ src:gtklp: "FTBFS with binutils-gold"
add patch from Hideki Yamane, upload to DELAYED/2
~~#640613~~ src:jaminid: "jaminid: FTBFS: xargs: /usr/lib/jvm/java-6-openjdk/bin/javac: No such file or directory"
build-depend on default-jdk and set JAVA_HOME in debian/rules accordingly, upload to DELAYED/2
~~#640630~~ src:policycoreutils: "policycoreutils: FTBFS: chmod: cannot access /build/policycoreutils-r6ObWS/policycoreutils-2.0.82/debian/policycoreutils/etc/init.d/policycoreutils': No such file or directory"
apply patch from Ubuntu / Mitsuya Shibata, upload to DELAYED/4
~~#640633~~ src:tunnelx: "tunnelx: FTBFS: xargs: /usr/lib/jvm/java-6-openjdk/bin/javac: No such file or directory"
build-depend on default-jdk and set JAVA_HOME in debian/rules accordingly, upload to DELAYED/2
~~#642686~~ src:javamorph: "javamorph: FTBFS: /bin/sh: 1: /usr/lib/jvm/java-6-openjdk/bin/javac: not found"
build-depend on default-jdk and set JAVA in Makefile accordingly, upload to DELAYED/2
~~#645493~~ src:postler: "Will FTBFS with libindicate 0.6"
apply patch from Evgeni Golov, upload to DELAYED/2
~~#646388~~ speech-dispatcher: "FTBFS due to locally changed files without patch"
update autotools handling, partly taken from Ubuntu / Luke Yelavich, upload to DELAYED/2
~~#646647~~ src:latex-cjk-chinese-arphic: "latex-cjk-chinese-arphic: FTBFS: The requested file, gbsn00lp.ttf, does not exist"
apply (modified) patch from Hideki Yamane, upload to DELAYED/2
~~#652222~~ src:sensors-applet: "sensors-applet: FTBFS: Could not parse DTD http://scrollkeeper.sourceforge.net/dtds/scrollkeeper-omf-1.0/scrollkeeper-omf.dtd"
add patch to use local DTD, upload to DELAYED/2
~~#652758~~ src:policycoreutils: "policycoreutils: FTBFS: cc: error: /usr/lib/libsepol.a: No such file or directory"
multiarchify, based on patch by Hideki Yamane, upload to DELAYED/4
~~#658054~~ instead: "Can not upgrade package instead, because new package instead-data try rewrite files from old package instead."
add a comment to the BTS

2 February 2012

Marco Silva: Using Horizontal Sharding in SQLAlchemy to display multiple DBs

My problem was: I had a number of databases generated in different machines and I wanted to query them as if they were one, using the database the data came from as a field while querying and while showing results. The databases are SQLite3 files, generated using SQLAlchemy in a Python program. I solved this by using SQLAlchemy, which was good because I could use the same ORM mapping that the program used. I noticed that the Horizontal Sharding SQLAlchemy extension would fit well the problem, although not perfectly. I had to make some changes in some classes of this extension, and now it works fine. It was possible to filter the data using database as a criteria, but I couldn't get the database information from each line of a query result. I made a simple patch to SQLAlchemy, which wasn't likely to be introduced in the distribution, but worked for me, and sent it to its bug tracker. The change was included in SQLAlchemy in a very different fashion, as expected, but since I'm using the released version from SQLAlchemy, I kept on using my version of the patch. I don't want to do direct changes in SQLAlchemy source code, so I made the change in my program:

class ShardedSessionShardId(ShardedSession):
    def __init__(self, *args, **kwargs):
        super(ShardedSessionShardId, self).__init__(*args, **kwargs)
        self._query_cls = ShardedQueryShardId
class ShardedQueryShardId(ShardedQuery):
    def _execute_and_instances(self, context):
        if self._shard_id is not None:
            result = self.session.connection(
                            mapper=self._mapper_zero(),
                            shard_id=self._shard_id).execute(context.statement, self._params)
            news = list(self.instances(result, context))
            for new in news:
                new.shard_id = self._shard_id
            return iter(news)
        else:
            partial = []
            for shard_id in self.query_chooser(self):
                result = self.session.connection(
                            mapper=self._mapper_zero(),
                            shard_id=shard_id).execute(context.statement, self._params)
                news = list(self.instances(result, context))
                for new in news:
                    new.shard_id = shard_id
                partial = partial + news
            # if some kind of in memory 'sorting'
            # were done, this is where it would happen
            return iter(partial)
create_session = sessionmaker(class_=ShardedSessionShardId)

Another problem is that I had to make each result be included in the query, even if two results from different DBs have the same primary key. I achieved this by changing two classes: WeakInstanceDict, and Mapper. For using the new WeakInstanceDict, I had again to change the ShardedSession variation:

class WeakInstanceDictNoIdentity(WeakInstanceDict):
    def add(self, state):
        # if state.key in self:
        #     if dict.__getitem__(self, state.key) is not state:
        #         raise AssertionError("A conflicting state is already "
        #                             "present in the identity map for key %r"
        #                             % (state.key, ))
        # else:
            dict.__setitem__(self, state.key, state)
            self._manage_incoming_state(state)
class ShardedSessionShardId(ShardedSession):
    def __init__(self, *args, **kwargs):
        super(ShardedSessionShardId, self).__init__(*args, **kwargs)
        self._query_cls = ShardedQueryShardId
        self._identity_cls = WeakInstanceDictNoIdentity
        self.identity_map = self._identity_cls()

To start using the new Mapper, I simply replaced each call to mapper with MapperNoIdentity:

class MapperNoIdentity(Mapper):
    def _instance_processor(self, context, path, adapter,
                                polymorphic_from=None, extension=None,
                                only_load_props=None, refresh_state=None,
                                polymorphic_discriminator=None):
        """Produce a mapper level row processor callable
           which processes rows into mapped instances."""
        pk_cols = self.primary_key
        if polymorphic_from or refresh_state:
            polymorphic_on = None
        else:
            if polymorphic_discriminator is not None:
                polymorphic_on = polymorphic_discriminator
            else:
                polymorphic_on = self.polymorphic_on
            polymorphic_instances = util.PopulateDict(
                                        self._configure_subclass_mapper(
                                                context, path, adapter)
                                        )
        version_id_col = self.version_id_col
        if adapter:
            pk_cols = [adapter.columns[c] for c in pk_cols]
            if polymorphic_on is not None:
                polymorphic_on = adapter.columns[polymorphic_on]
            if version_id_col is not None:
                version_id_col = adapter.columns[version_id_col]
        identity_class = self._identity_class
        def identity_key(row):
            return identity_class, tuple([row[column] for column in pk_cols])
        new_populators = []
        existing_populators = []
        load_path = context.query._current_path + path
        def populate_state(state, dict_, row, isnew, only_load_props):
            if isnew:
                if context.propagate_options:
                    state.load_options = context.propagate_options
                if state.load_options:
                    state.load_path = load_path
            if not new_populators:
                new_populators[:], existing_populators[:] = \
                                    self._populators(context, path, row,
                                                        adapter)
            if isnew:
                populators = new_populators
            else:
                populators = existing_populators
            if only_load_props:
                populators = [p for p in populators
                                if p[0] in only_load_props]
            for key, populator in populators:
                populator(state, dict_, row)
        session_identity_map = context.session.identity_map
        if not extension:
            extension = self.extension
        translate_row = extension.get('translate_row', None)
        create_instance = extension.get('create_instance', None)
        populate_instance = extension.get('populate_instance', None)
        append_result = extension.get('append_result', None)
        populate_existing = context.populate_existing or self.always_refresh
        if self.allow_partial_pks:
            is_not_primary_key = _none_set.issuperset
        else:
            is_not_primary_key = _none_set.issubset
        def _instance(row, result):
            if translate_row:
                ret = translate_row(self, context, row)
                if ret is not EXT_CONTINUE:
                    row = ret
            if polymorphic_on is not None:
                discriminator = row[polymorphic_on]
                if discriminator is not None:
                    _instance = polymorphic_instances[discriminator]
                    if _instance:
                        return _instance(row, result)
            # determine identity key
            if refresh_state:
                identitykey = refresh_state.key
                if identitykey is None:
                    # super-rare condition; a refresh is being called
                    # on a non-instance-key instance; this is meant to only
                    # occur within a flush()
                    identitykey = self._identity_key_from_state(refresh_state)
            else:
                identitykey = identity_key(row)
            # instance = session_identity_map.get(identitykey)
            # if instance is not None:
            #     state = attributes.instance_state(instance)
            #     dict_ = attributes.instance_dict(instance)
            #     isnew = state.runid != context.runid
            #     currentload = not isnew
            #     loaded_instance = False
            #     if not currentload and \
            #             version_id_col is not None and \
            #             context.version_check and \
            #             self._get_state_attr_by_column(
            #                     state,
            #                     dict_,
            #                     self.version_id_col) != \
            #                             row[version_id_col]:
            #         raise orm_exc.ConcurrentModificationError(
            #                 "Instance '%s' version of %s does not match %s"
            #                 % (state_str(state),
            #                     self._get_state_attr_by_column(
            #                                 state, dict_,
            #                                 self.version_id_col),
            #                         row[version_id_col]))
            # elif refresh_state:
            if refresh_state:
                # out of band refresh_state detected (i.e. its not in the
                # session.identity_map) honor it anyway.  this can happen
                # if a _get() occurs within save_obj(), such as
                # when eager_defaults is True.
                state = refresh_state
                instance = state.obj()
                dict_ = attributes.instance_dict(instance)
                isnew = state.runid != context.runid
                currentload = True
                loaded_instance = False
            else:
                # check for non-NULL values in the primary key columns,
                # else no entity is returned for the row
                if is_not_primary_key(identitykey[1]):
                    return None
                isnew = True
                currentload = True
                loaded_instance = True
                if create_instance:
                    instance = create_instance(self,
                                                context,
                                                row, self.class_)
                    if instance is EXT_CONTINUE:
                        instance = self.class_manager.new_instance()
                    else:
                        manager = attributes.manager_of_class(
                                                instance.__class__)
                        # TODO: if manager is None, raise a friendly error
                        # about returning instances of unmapped types
                        manager.setup_instance(instance)
                else:
                    instance = self.class_manager.new_instance()
                dict_ = attributes.instance_dict(instance)
                state = attributes.instance_state(instance)
                state.key = identitykey
                # manually adding instance to session.  for a complete add,
                # session._finalize_loaded() must be called.
                state.session_id = context.session.hash_key
                session_identity_map.add(state)
            if currentload or populate_existing:
                if isnew:
                    state.runid = context.runid
                    context.progress[state] = dict_
                if not populate_instance or \
                        populate_instance(self, context, row, instance,
                            only_load_props=only_load_props,
                            instancekey=identitykey, isnew=isnew) is \
                            EXT_CONTINUE:
                    populate_state(state, dict_, row, isnew, only_load_props)
            else:
                # populate attributes on non-loading instances which have
                # been expired
                # TODO: apply eager loads to un-lazy loaded collections ?
                if state in context.partials or state.unloaded:
                    if state in context.partials:
                        isnew = False
                        (d_, attrs) = context.partials[state]
                    else:
                        isnew = True
                        attrs = state.unloaded
                        # allow query.instances to commit the subset of attrs
                        context.partials[state] = (dict_, attrs)
                    if not populate_instance or \
                            populate_instance(self, context, row, instance,
                                only_load_props=attrs,
                                instancekey=identitykey, isnew=isnew) is \
                                EXT_CONTINUE:
                        populate_state(state, dict_, row, isnew, attrs)
            if loaded_instance:
                state._run_on_load(instance)
            if result is not None and \
                        (not append_result or
                            append_result(self, context, row, instance,
                                    result, instancekey=identitykey,
                                    isnew=isnew)
                                    is EXT_CONTINUE):
                result.append(instance)
            return instance
        return _instance

I had to include some auxiliary definitions to make the rewrites work:

_none_set = frozenset([None])
_runid = 1L
_id_lock = util.threading.Lock()
def _new_runid():
    global _runid
    _id_lock.acquire()
    try:
        _runid += 1
        return _runid
    finally:
        _id_lock.release()

It would be good to be able to set these identity requirements as a parameter. My last problem was selecting more than one database to search. setshard only worked for one, so I created a new field in query, called shards, and checked for it on querychooser:

def query_chooser(query):
    try:
        return query.shards
    except AttributeError:
        pass
    return tcs.keys()

So, when I want to look only in a list of shards, I set this field. I'm aware that this is not a recommended python idiom, but, well, it works fine.

28 November 2011

Dirk Eddelbuettel: A Story of Life and Death. On CRAN. With Packages.

The Comprehensive R Archive Network, or CRAN for short, has been a major driver in the success and rapid proliferation of the R statistical language and environment. CRAN currently hosts around 3400 packages, and is growing at a rapid rate. Not too long ago, John Fox gave a keynote lecture at the annual R conference and provided a lot of quantitative insight into R and CRAN---including an estimate of an incredible growth rate of 40% as a near-perfect straight line on a log-log chart! So CRAN does in fact grow exponentially. (His talk morphed into this paper in the R Journal, see figure 3 for this chart.) The success of CRAN is due to a lot of hard work by the CRAN maintainers, lead for many years and still today by Kurt Hornik whose dedication is unparalleled. Even at the current growth rate of several packages a day, all submissions are still rigorously quality-controlled using strong testing features available in the R system. And for all its successes, and without trying to sound ungrateful, there have always been some things missing at CRAN. It has always been difficult to keep a handle on the rapidly growing archive. Task Views for particular fields, edited by volunteers with specific domain knowledge (including yours truly) help somewhat, but still cannot keep up with the flow. What is missing are regular updates on packages. What is also missing is a better review and voting system (and while Hadley Wickham mentored a Google Summer of Code student to write CRANtastic, it seems fair to say that this subproject didn't exactly take off either). Following useR! 2007 in Ames, I decided to do something and noodled over a first design on the drive back to Chicago. A weekend of hacking lead to CRANberries. CRANberries uses existing R functions to learn which packages are available right now, and compares that to data stored in a local SQLite database. This is enough to learn two things: First, which new packages were added since the last run. That is very useful information, and it feeds a website with blog subscriptions (for the technically minded: an RSS feed, at this URL). Second, it can also compare current versions numbers with the most recent stored version number, and thereby learns about updated packages. This too is useful, and also feeds a website and RSS stream (at this URL; there is also a combined one for new and updated packages.) CRANberries writes out little summaries for both new packages (essentially copying what the DESCRIPTION file contains), and a quick diffstat summary for updated packages. A static blog compiler munges this into static html pages which I serve from here, and creates the RSS feed data at the same time. All this has been operating since 2007. Google Reader tells me the the RSS feed averages around 137 posts per week, and has about 160 subscribers. It does feed to Planet R which itself redistributes so it is hard to estimate the absolute number of readers. My weblogs also indicate a steady number of visits to the html versions. The most recent innovation was to add tweeting earlier in 2011 under the @CRANberriesFeed Twitter handle. After all, the best way to address information overload and too many posts in our RSS readers surely is to ... just generate more information and add some Twitter noise. So CRANberries now tweets a message for each new package, and a summary message for each set of new packages (or several if the total length exceeds the 140 character limit). As of today, we have sent 1723 tweets to what are currently 171 subscribers. Tweets for updated packages were added a few months later. Which leads us to today's innovation. One feature which has truly been missing from CRAN was updates about withdrawn packages. Packages can be withdrawn for a number of reasons. Back in the day, CRAN carried so-called bundles carrying packages inside. Examples were VR and gregmisc. Both had long been split into their component packages, making VR and gregmisc part of the set of packages no longer on the top page of CRAN, but only its archive section. Other examples are packages such as Design, which its author Frank Harrell renamed to rms to match to title of the book covering its methodology. And then there are of course package for which the maintainer disappeared, or lost interest, or was unable to keep up with quality requirements imposed by CRAN. All these packages are of course still in the Archive section of CRAN. But how many packages did disappear? Well, compared to the information accumulated by CRANberries over the years, as of today a staggering 282 packages have been withdrawn for various reasons. And at least I would like to know more regularly when this happens, if only so I have a chance to see if the retired package is one the 120+ packages I still look after for Debian (as happened recently with two Rmetrics packages). So starting with the next scheduled run, CRANberries will also report removed packages, in its own subtree of the website and its own RSS feed (which should appear at this URL). I made the required code changes (all of about two dozen lines), and did some light testing. To not overwhelm us all with line noise while we catch up to the current steady state of packages, I have (temporarily) lowered the frequency with which CRANberries is called by cron. I also put a cap on the number of removed packages that are reported in each run. As always with new code, there may be a bug or two but I will try to catch up in due course. I hope this is of interest and use to others. If so, please use the RSS feeds in your RSS readers, and subscribe to the @CRANberriesFeed, And keep using CRAN, and let's all say thanks to Kurt, Stefan, Uwe, and everybody who is working on CRAN (or has been in the past).

23 August 2011

Rudy Godoy: Compute clusters for Debian development and building final report

Compute clusters for Debian development and building final report Summary
-
The goal of this project was intended to have Eucalyptus cloud to
support ARM images so it allows Debian developers and users to be able
to use such facility for taks such package building, software
development (ie. Android) under a Debian pre-set image,
software testing many others. What was expected to have at the end is a modified version of Eucalyptus
Cloud that supports ARM images on the first place. To date this goal has
been reached but is not complete, read production-ready. Extensive test
needs to be done. Besides that we have another goal which is to get
Debian community to use this new extended tool. Project overview
-
Eucalyptus is a hybrid cloud computing platform which has a Open Source
(FLOSS) version. It s targetted to be used for PaaS (Platform as a
Service), IaaS (Infrastructure as a Service) and other distribution
models. It can be used also for cloud management. Given that it has
implemented the EC2, for computing, and the S3, for storage, API, it s
compatible with existing public clouds such Amazon EC2. Currently it
supports running i386 and amd64 images or NC (Node Controllers) in
Eucalyptus naming. Eucalyptus is a complex piece of software. It s architechture it s
modular composed by five components. Cloud Controller (CLC), Walrus (W),
Storage Controller (SC), Cluster Controller (CC) and Node Controller
(NC). The first three are written in Java and the remaining are written
in C. Our project modifications was targetted to the NC component,
altough there is a remaining task for hacking the UI to allow setting
the arch of the uploaded NC image instance. The Node Controller is in charge of setting the proper information for
Eucalyptus to be able to run virtualized images. Eucalyptus uses XEN and
KVM hyphervisor to handle internal virtualization subsystem, and
Libvirt, which is an abstraction of the existing virtualization
libraries, for program interfacing. Having that in mind we are back to our project. For our project to be
sucessful we had various tasks to do. Beginning with understanding such
complex piece of software, followed by hacking the requiered bits and
later integrate the work so it results in a useful tool. Our approach to
the project was then inline with such description. The project s history
I started my participation in GSoC approaching Steffen to discuss about
what was expected, or more accurately, what was in his mind. We
exchanged emails previous to my application and that resulted on my
application being submitted. From the beginning I should tell that I
wasn t much clear on the final product of the project, however we
refined ideas and goals during the weeks following to the official
begining of the project. What was clear for me, after all, is that for
this to see the light and gain adoption Eucalyptus code needed to be
dealt with. Our first task was review ARM image creation from scratch, using the
work done by Aurelien and Dominique in previous GSoC. I ve managed to
get an updated ARM image running under qemu-system-arm. The issues
presented in the past were almost unexistant now that the versatile
kernel is official. You can see this on my first report. After that being done and sometimes in parallel my main goal was then understand
the internals of the Eucalyptus software in order to figure out it s
feasibility, where do we need to extend and how big the task is, we
didn t knew upfront. From the beginning of the project Steffen was kind
to introduce me to the Eucalyptus developers and that have resulted in a
good outcome for Debian, IMHO, to date. Understanding Eucalyptus internals was quite a fun task to say the less. As you
can see on the first part of the report Eucalyptus is modularized and
the component I was expected to work with was the NC (Node
Controller). Given that I isolated my work focus, I started to focus on
learning the internals about that component. Node Controller as described is in charge of talking with
virtualization hyphervisors in order to arrange things to run guest
image instances. The module has basically one main component: handler.c,
in charge to talk to the appropiate hyphervisor and there are
extensions (think of OOP polymorphism but acknoledge it s plain C)
that interact with KVM or XEN. I figured that if qemu-kvm is ran in Hyphervisor mode we can manage to
run an ARM image with qemu-system-arm. Given that Eucalyptus has
interaction with KVM hyphervisor in place, this answered the question of
the project s feasibility. First green light on.
>From this point the scope reduced to interacting with the KVM handler
(node/handler_kvm.c) and extend it to support running an image that is
not amd64 or i386. NC makes use of libvirt to abstract interaction with Hyphervisors. So,
the next phase was learning about it s API and figure what s needed to
be modified next. Libvirt uses a XML file for setting Domain (in
libvirt s naming) definitions. So, Eucalyptus provides a Perl wrapper to
generate this file on runtime and allow the NC to invoke libvirt s API
to run the instance. Next task then, was adapt such script to support
ARM arch. The current script is taiolred for amd64 and i383. I worked on
that front and managed to get a script prototype, that can later be
improved and support more arch s. Generating an adequate Libvirt s XML Domain definition file for ARM can
be a heroic task. There are many things to have in mind given the
diversity of the ARM processor and vendors. I focused on the versatile
flavour given that I was going to test the image I built on the first
place and it ran fine under qemu-system-arm.
The Perl script wrapper then was adapted for such configuration and it
can be tested independently by issuing the following command: $ tools/gen_kvm_libvirt_xml arch arm The arch parameter is what I implemented and it s intended to be
called from the kvm handler on instance creation. With the extensibility
in mind I ve created a hash to associate the arch with their
corresponding emulator. our %arches = (
amd64 => /usr/bin/kvm ,
arm => /usr/bin/qemu-system-arm ,
); Added the arch parameter to the GetOptions() function and used
conditions to tell whether the user is looking for a particular arch,
arm in our case. The most important parts to be considered are the
, and entries. hvm $local_kvm There s also and that require to be tailored for arm. root=0800 As output the script generates an XML template that can be adapted to
your needs and it could be used and tested with tools like Libvirt s
virsh. So that, it can be useful independently of Eucalyptus. I managed
to get to this point before the midterm evaluation. Now that we had the XML wrapper almost done, the next task was to make
the handler call pass the arch as an argument to the script, so the
image is loaded with the proper settings and we are able to run it. Eucalyptus doesn t have a arch field for NC s instances. So, after approaching
Eucalyptus developers, with whom we had already being interacting, I
settled to my proposal of extending the ncInstance struct with an arch
field (util/data.h). The ncInstance_t struct stores the metadata for the
instance being created. It s used for storing runtime data as well as
network configuration and more. It was indeed the right place to add a
new field. I did so by creating the archId field.
Now I needed to make sure the arch information is stored and later used on the
libvirt s call. typedef struct ncInstance_t
char instanceId[CHAR_BUFFER_SIZE];
char imageId[CHAR_BUFFER_SIZE];
char imageURL[CHAR_BUFFER_SIZE];
char kernelId[CHAR_BUFFER_SIZE];
char kernelURL[CHAR_BUFFER_SIZE];
char ramdiskId[CHAR_BUFFER_SIZE];
char ramdiskURL[CHAR_BUFFER_SIZE];
char reservationId[CHAR_BUFFER_SIZE];
char userId[CHAR_BUFFER_SIZE];
char archId[CHAR_BUFFER_SIZE];
int retries; /* state as reported to CC & CLC */
char stateName[CHAR_BUFFER_SIZE]; /* as string */
int stateCode; /* as int */ /* state as NC thinks of it */
instance_states state; char keyName[CHAR_BUFFER_SIZE*4];
char privateDnsName[CHAR_BUFFER_SIZE];
char dnsName[CHAR_BUFFER_SIZE];
int launchTime; // timestamp of RunInstances request arrival
int bootTime; // timestamp of STAGING->BOOTING transition
int terminationTime; // timestamp of when resources are released (->TEARDOWN transition) virtualMachine params;
netConfig ncnet;
pthread_t tcb; /* passed into NC via runInstances for safekeeping */
char userData[CHAR_BUFFER_SIZE*10];
char launchIndex[CHAR_BUFFER_SIZE];
char groupNames[EUCA_MAX_GROUPS][CHAR_BUFFER_SIZE];
int groupNamesSize; /* updated by NC upon Attach/DetachVolume */
ncVolume volumes[EUCA_MAX_VOLUMES];
int volumesSize;
ncInstance; With that in place what was left was modifying the functions that
store/update the ncInstance data, and also the libvirt s function that
calls the gen_kvm_libvirt_xml script. I ve modified the following files:
- util/data.c
- node/handlers_kvm.c
- node/handlers. c,h
- node/test.c
- node/client-marshall-adb.c Most important the allocate_instance() function that is in charge of setting up
the instance metadata a prepare it to pass it to the handler, then
hyphervisor trough Libvirt s API. The function now has a new archId
parameter as well, to keep coherence with the field name. It also
handles whether the field is set or not. We (Eucalyptus developers and
I) haven t settled wether to initialize this field with a default value
or not. I d stepped up and initialized with a NULL value for this string
type var. if (archId != NULL)
strncpy(inst->archId, archId, CHAR_BUFFER_SIZE);
Stores the value only if it s passed to the function. This is essential to
don t break existing functionality and keep consistency for later releases.
With this we almost finished the part of extending Eucalyptus to support
arm images. It took quite long to get to that point. Next step: test. As I mentioned in my previous report the current Eucalyptus packaging
has issues. From the beginning I ve approached the pkg-eucalyptus team,
who were very hepful. So I myself set me up and joined the pkg team with
commit access to the repo. Even the GSoC is not intended for packaging
labour, I needed to clear things on that side because what we want is
adoption so anyone, say a DSA or a user, could setup a cloud that has
support for running arm instances under amd64 arch. Over the weeks between mid-term evaluation and past week I ve worked on
that front. Results were good, as you can see in the pkg-eucalyptus
mailing list and SVN repo.
Eucalyptus has a couple of issues that because of it s nature and complexity
triggered build errors. Those issues came from the Java side of the project and
few form the C-side. The C-part was a problem with AXIS2 Jar used to generate
stubs for later inclusion on build-time. There s no definitive solution to date
because one of the Makefile on gatherlog/Makefile issues a subshell call that
doesn t catches the AXIS_HOME environment variable. I ve worked around this
by defining it as a session env var from .zshrc. The other problem is related
to the Java libary versioning and most likely (we had to investigate further)
to groovy versioning. As I explained Eucalyptus has 3 components written in Java. The Java code uses
extensively the AXIS2 library for implementing webservices calls, the groovy
language for JVM, and many other Java libaries. Eucalyptus open source version
ships both online and offline versions. The difference is basically that
the offline version ships the Java libaries JAR files inside the source code,
so you can build with all required deps. Our former packaging used a file to list which Java libraries the package
depends on and it symlinked them using system wide installed packages. This
rather than helping out triggered a lot of problems I explaing with more detail
on the pkg-eucalyptus ml. I worked around this by telling the debian/rules
not to use such file to symlink and instead use the existing ones from
the source. The last weekend I managed to get to that point and built
sucessfully the software under Debian. I ve done that in order to have a
complete build-cycle for my changes, and then test. In fact today I ve
spotted and fixed a couple of bugs there. What s available now

To date we have a Eucalyptus branch where my patches are sent. The level of
extension is nearly complete, more testing needed. One missing bit is to
change the UI so that the user selects which arch her image is and that
value is passed to the corresponding functions on the Node Controller
component. We also have an ARM image that can be tested and later automated. I ve
recently learned about David Went s VMBuilder repo that extends Ubuntu s
vmbuilder to support image creationg. I might talk to him to adapt the tool
in order to support creation of arm images and use it to upload to an
Eucalyptus cloud. I ve also created a wiki page with different bits on which I was working on.
I ve yet need to craft it to be more educational. I guess I m going to use
this report as a source :) We also had more strong cooperation with the Eucalyptus project and
involved their developers on the packaging and this project, which is
something I consider a great outcome for this project and GSoC goals:
to attract more people to contribute to FLOSS. I ve also been contacted by
people from the Panda Project and we might cooperate also. Future work

I plan to keep working on the project regardless the GSoC ends. Our goal
as Debian could be integrating this into the Eucalyptus upstream
branch. We have good relations with them so expect news from that side.
I also plan to keep working with the Eucalyptus team and related
software in Debian, given that I m familiarized with the tools and
projects.
I m also planning to advocate SoC for Debian in my faculty. To date we
had already 4 students, IIRC, that have participated in the past. Project challenges and lessons
During the project I faced many challenges both in the personal side
and in the technical one. This part is a bit personal but I expect we
learn from this. First challenge was the where do I sit problem. My project was
particular, indeed quite different from the others on it s nature. I had
to work on a software that is not Debian s and then come and say Hey we
have this nice tool for you, come test it! . So, indeed, my focus was on
the understanding of such tool, Eucalyptus, and not much on looking back
at Debian because I didn t feel I had something to advertise yet and
indeed I got most valuable/useful feedback from Eucalyptus developers
rather than Debian s, which for the case is OK IMHO. I had great
mentoring. This situation probably looked a bit of isolation from my
side, from Debian s POV, but it was not intended it s just how it needed
to be IMHO. I felt that I wasn t contributing directly to the
project. I ve spoke about such thing with Steffen a couple of times. I d
like to say that it was also my concern but I probably failed on
communicating this, second challenge. Third challenge was more technical and related to the first one. Since I
was not writing any code for a Debian native package/program but instead for
other project I faced the situation of where to publish my
contributions. Eucalyptus had some issues on managing their FLOSS repo
and the current development one is outdated comparing to the
-src-offline 2.0.3 release. The project didn t fit pkg-eucalyptus repo
neither, even we thought to branch the existing trunk and ship my
contribs as quilt patches, but it involved a lot of more work and burn
on reworking something that has issues now. Later on time I branched the
bzr eucalyptus-devel Launchpad repo and synced there with the
src-offline code. I was slow to react on thing I can say. Fourth challenge was also technical and still dealing with. I was looking to
setup a machine to set a cloud and test my branch. The Dean of my faculty
kindly offered a machine but I ve never managed to get things arranged
with the technical team. Steffen and I discussed about this and evaluated
options, at the end we settled on pushing adoption rather than showcasing it.
Since for adoption we need a use case, I ve spend some SoC money on hardware
for doing this. I should have news this week on that side :) Finally, I d like to thank everyone on the Debian SoC team for this
opportunity to participate. I like to thank Steffen for his effort on
arranging things for me, Graziano from Eucalyptus for his advice on
working with the software code. The different people from Eucalyptus I
interacted with and showed interest on the project s sucess. My
classmates and teachers from the Computer Science School at UCSP in
Arequipa-Peru, friends from the computing community in Peru SPC and
finally my family. Thank you, I learned a lot, specially on the personal
side. As Randy Pausch put: The brick walls are there to give us a
chance to show how badly we want something. Resources
Please see the Wiki page[1] for any reference to the project s resources. 1- http://wiki.debian.org/SummerOfCode2011/BuildWithEucalyptus/ProjectLog Best regards,
Rudy

Next.

Previous.